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nucleic acid binding polypeptides characterized by flexible linkers connected 

nucleic acid binding modules 

Field of the Invention 

This invention also relates to linkers for linking together nucleic acid binding 
polypeptide modules. This invention farther relates to nucleic acid binding 
5 polypeptides, in particular nucleic acid binding polypeptides capable of binding 

sequences separated by one or more gaps of varying sizes, and methods for designing 
such polypeptides. 

Background of the Invention 

Protein-nucleic acid recognition is a commonplace phenomenon which is 
1 0 central to a large number of biomolecular control mechanisms which regulate the 
functioning of eukaryotic and prokaryotic cells. For instance, protein-DNA 
interactions form the basis of the regulation of gene expression and are thus one of the 
subjects most widely studied by molecular biologists. Many DNA-binding proteins 
contain independently folded domains for the recognition of DNA, and these domains 
1 5 in turn belong to a large number of structural families, such as the leucine zipper, the 
"helix-turn-helix" and zinc finger families. Despite the great variety of structural 
domains, the specificity of the interactions observed to date between protein and DNA 
most often derives from the complementarity of the surfaces of a protein a-helix and 
the major groove of DNA (Klug, 1993, Gene 135:83-92). 

20 Zinc finger proteins are ubiquitous eukaryotic DNA - binding modules first 

identified mXenopus transcription factor IIIA (TFIIIA). Each zinc finger protein 
consists of a number of autonomous DNA binding units. For example, the mouse 
Zif268 zinc finger protein is a protein of 90 amino acid residues belonging to the Cys 2 - 
His2 zinc family. Zif268 contains three independent zinc finger domains of 24 residues 

25 each. Each zinc finger domain ("finger") consists of a single a helix joined to two 

strands of antiparallel p-sheets and held together via chelation of a zinc ion (Pavletich 
and Pabo, 1991, Science 252, 809-817). Sequence-specific DNA binding is mediated 
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by residues located on the exposed face of the a helix, which interacts with the major 
groove of DNA. One zinc finger domain interacts with about three base pairs, so that a 
number of lingers, which are linked together by linkers, are required to bind a longer 
DNA sequence. The linkers of various zinc finger proteins have been compared, and a 
5 consensus sequence (the "canonical sequence") determined, consisting of four amino 
acids Gly-Glu-Lys-Pro. This canonical linker is termed the "GEKP linker". However, 
variants of this sequence are possible, for example, Gly-Gln-Lys-Pro, Gly-Glu-Arg- 
Pro and Gly-Gln-Arg-Pro. 

It has been suggested that the contacts between particular amino acids and 
10 DNA base sequence may be described by a simple set of rules. However, current 

methods for the design and selection of zinc finger modules are not generally capable 
of producing zinc finger proteins that are capable of binding to any given DNA 
sequence. This is because certain nucleotide sequences will constitute favourable 
binding sites for zinc finger binding. It is known, for example, that DNA sequences 
1 5 which contain G-rich regions are highly specific binding sites for zinc finger proteins. 
In particular, zinc fingers tend to bind DNA sequences which contain G at every third 
position with high specificity. On the other hand, with regard to other sequences it will 
be difficult or impossible to design zinc fingers which bind specifically to that 
sequence. Thus, for example, pyrimidine-rich DNA sequences comprise less 
20 favourable binding sites for zinc fingers. In order to increase the affinity and 

specificity of binding, it is therefore desirable to construct zinc fingers which will 
tolerate gaps between the nucleotide sequences which are contacted by the fingers. 

It is known in the prior art to attempt to increase affinity and specificity of zinc 
finger binding by linking together separate zinc finger domains with a canonical 

25 sequence. Thus, Rebar (1997, PhD Thesis, Massachusetts Institute of Technology, 
Massachusetts, USA) and Shi (1995, PhD Thesis, Johns Hopkins University, 
Maryland, USA) describe linking additional fingers to a three-finger protein using a 
GERP linker, and observe a relatively modest increase in affinity. Furthermore, 
tandem linkage of two three-finger proteins using a canonical linker has been 

30 described by Liu et al (1997), Proc. Natl. Acad Set USA 94, 5525-5530. The affinity 
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of binding of this six finger protein is found to be increased approximately 68-74 fold 
relative to each three-finger peptide, which is a poor result compared to that predicted 
by theory. A different approach is described by Kim and Pabo (1998, Proc. Natl Acad. 
Sci USA 95, 2812-2817), who* use structure based design to generate a six-finger 

5 construct, using flexible linkers comprising 8 or 1 1 amino acids to link two three 

finger peptides (Zif268 and NRE). However, this construct is only capable of spanning 
a single gap (comprising 0-2 base pairs) in the composite DNA target site. Structure 
based design has also been used to construct a fusion protein consisting of zinc fingers 
from Zif268 and the homeodomain from Oct-1 (Pomerantz et al., 1995, Science 267, 

10 93-6). Thus, in summary, to date, several groups have created six (or nine) -finger 

fusion peptides to bind long stretches of DNA with high affinity (Kim, J-S. & Pabo, C. 
O. (1998) Proc. Natl. Acad Sci. USA 95, 2812-2817; Liu, Q., Segal, D. J., Ghiara, J. 
B. & Barbas, C. F. Ill (1997) Proc. Natl. Acad. Sci. USA 94, 5525-5530; Kamiuchi, T, 
Abe, E., Imanishi, M, Kaji, T., Nagaoka, M. & Sugiura, Y. (1998) Biochemistry 37, 

1 5 1 3827-1 3834). However, the affinities of these constructs vary greatly and have 
generally been far weaker than expected. In addition, all of these peptides have 
targeted either contiguous DNA sequences, or those containing just one or two 
nucleotides of unbound DNA. 

It is therefore an object of the present invention to provide nucleic acid binding 
20 polypeptides which are capable of spanning longer gaps between DNA binding 
subsites. It is a further object of the invention to provide nucleic acid binding 
polypeptides which are capable of spanning a greater number of gaps between the 
DNA binding subsites. It is a yet further object of the invention to provide nucleic acid 
binding polypeptides which are capable of spanning variable gaps between DNA 
25 binding subsites. 

Summary of the Invention 

The invention in general provides for the use of linkers to link two or more 
nucleic acid domains. The linkers according to the invention are non-canonical linkers, 
which are flexible or structured. According to the invention in its various aspects, we 
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provide methods of producing a modified nucleic acid binding polypeptide, nucleic 
acid binding polypeptides as made by such a method, nucleic acid binding 
polypeptides, nucleic acids encoding such nucleic acid binding polypeptides, host cells 
transformed with such nucleic acids, pharmaceutical compositions comprising such 
5 polypeptides or such nucleic acids, and uses of certain linkers. . 

According to a first aspect of the invention, we provide a nucleic acid binding 
proteins comprising nucleic acid binding domains linked by flexible linkers. This 
aspect of the invention is summarised by the following paragraphs: 

We describe a method of producing a modified nucleic acid binding 
1 0 polypeptide, the method comprising the steps of: (a) providing a nucleic acid binding 
polypeptide comprising a plurality of nucleic acid binding modules; (b) selecting a 
first binding domain consisting of one or two contiguous nucleic acid binding 
modules; (c) selecting a second binding domain consisting of one or two contiguous 
nucleic acid binding modules; and (d) introducing a linker sequence to link the first 
1 5 and second binding domains, the linker sequence comprising five or more amino acid 
residues. Preferably, the linker sequence is a flexible linker sequence. 

Preferably, steps (b) to (d) are repeated. More preferably, in which the binding 
affinity and/or specificity of the modified polypeptide to a nucleic acid sequence is 
increased compared to the binding affinity and/or specificity of an unmodified 
20 polypeptide. 

Preferably, the nucleic acid sequence comprises a sequence which is bound by 
the unmodified polypeptide. More preferably, the nucleic acid sequence comprises a 
sequence bound by the unmodified nucleic acid binding polypeptide, into which one or 
more nucleic acid residues has been inserted. Most preferably, the nucleic acid 
25 residue(s) are inserted between target subsites bound by the first and second binding 
domains of the unmodified polypeptide. 
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We further describe a method of making a nucleic acid binding polypeptide, 
the method comprising the steps of: (a) providing a first binding domain and a second 
binding domain, at least one of the first and second binding domains consisting of one 
or two nucleic acid binding module(s); and (b) linking the first and second binding 
5 domains with a linker sequence comprising five or more amino acid residues. 

We further describe a nucleic acid binding polypeptide comprising a first 
binding domain and a second binding domain linked by a linker sequence comprising 
five or more amino acid residues, in which at least one of the first and second binding 
domains consists of one or two nucleic acid binding module(s). 

10 The method or polypeptide may be one in which the nucleic acid binding 

module is a zinc finger of the Cys 2 -His2 type. Preferably, the nucleic acid binding 
module is selected from the group consisting of naturally occurring zinc fingers and 
consensus zinc fingers. Most preferably, the nucleic acid binding polypeptide is Zif- 
GAC. 

1 5 Preferably, the method or polypeptide is such that each of the first and the 

second binding domains consists of two binding modules. More preferably, the linker 
sequence comprises between 5 and 8 amino acid residues. 

Preferably, the linker sequence is provided by insertion of one or more amino 
acid residues into a canonical linker sequence. The canonical linker sequence may be 
20 selected from GEKP, GERP, GQKP and GQRP. Preferably, the linker sequence 
comprises a sequence selected from: GGEKP, GGQKP, GGSGEKP, GGSGQKP, 
GGSGGSGEKP, and GGSGGSGQKP. 

. Preferably, the nucleic acid binding polypeptide comprises a nucleic acid 
sequence selected from SEQ ID Nos: 22, 23, 24, 25, 26 and 27. 
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We further describe a nucleic acid binding polypeptide produced by a method 
as described above, a nucleic acid encoding a nucleic acid binding polypeptide as 
described above, and a host cell transformed with a nucleic acid as described above. 

We further describe a pharmaceutical composition comprising a polypeptide as 
5 described above or a nucleic acid as described above, together with a pharmaceutical^ 
acceptable carrier. 

We further describe a nucleic acid binding polypeptide comprising a repressor 
domain and a plurality of nucleic acid binding domains, the nucleic acid binding 
domains being linked by at least one non-canonical linker. The repressor domain may 
10 be a transcriptional repressor domain selected from the group consisting of: a KRAB- 
A domain, an engrailed domain and a snag domain. Preferably, the nucleic acid 
binding domains are linked by at least one flexible linker. 

According to a second aspect of the invention, we provide nucleic acid binding 
proteins comprising nucleic acid binding domains linked by structured linkers. This 
1 5 aspect of the invention is summarised by the following paragraphs: 

We describe a method of producing a modified nucleic acid binding 
polypeptide, the method comprising the steps of: (a) providing a nucleic acid binding 
polypeptide comprising a plurality of nucleic acid binding modules; (b) selecting a 
first binding domain comprising a nucleic acid binding module; (c) selecting a second 
20 binding domain comprising a nucleic acid binding module; and (d) introducing a linker 
sequence comprising a structured linker to link the first and second binding domains. 

Preferably, steps (b) to (d) are repeated. More preferably, the binding affinity 
and/or specificity of the modified polypeptide to a nucleic acid sequence is increased 
compared to the binding affinity and/or specificity of an unmodified polypeptide. 

25 Preferably, the nucleic acid sequence comprises a sequence which is bound by 

the unmodified polypeptide. More preferably, the nucleic acid sequence comprises a 
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sequence bound by the unmodified nucleic acid binding polypeptide, into which one or 
more nucleic acid residues has been inserted. Most preferably, the nucleic acid 
residue(s) are inserted between target subsites bound by the first and second binding 
domains of the unmodified polypeptide. The number of inserted nucleic acid residues 
5 may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 1 1 or more. 

We further describe a method of making a nucleic acid binding polypeptide, 
the method comprising the steps of: (a) providing a first binding domain comprising a 
nucleic acid binding module; (b) providing a second binding domain comprising a 
nucleic acid binding module; and (c) linking the first and second binding domains with, 
1 0 a linker sequence comprising a structured linker. 

We further describe provide a non-naturally occurring nucleic acid binding 
polypeptide comprising a first binding domain comprising a nucleic acid binding 
module and a second binding domain comprising a nucleic acid binding module, the 
first and second binding domains being linked by a linker sequence comprising a 
15 structured linker. 

Preferably, the nucleic acid binding module is a zinc finger of the Cys 2 -His 2 
type. More preferably, the method or polypeptide is one in which the nucleic acid 
binding module is selected from the group consisting of naturally occurring zinc 
fingers and consensus zinc fingers. 

20 Preferably, the structured linker comprises an amino acid sequence which is 

not capable of specifically binding nucleic acid. More preferably, the structured linker 
is derived from a zinc finger by mutation of one or more of its base contacting residues 
to reduce or abolish nucleic acid binding activity of the zinc finger. The structured 
linker may comprise the amino acid sequence of TFIIIA finger IV. Alternatively, the . 

25 zinc finger is finger 2 of wild type Zi£268 mutated at positions -1, 2, 3 and 6. 



Preferably, the method or polypeptide is one in which the first or second 
nucleic acid binding domain is selected from the group consisting of: fingers 1 to 3 of 
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TFIIIA, GAC and Zif. More preferably, the nucleic acid binding polypeptide 
comprises substantially the sequence of TF(l-4)-ZEF (SEQ ID NO: 53), GAC-F4-Zif 
(SEQ ID NO: 54) or Zif-ZnF-GAC (SEQ ID NO: 55). Most preferably, the or each 
linker sequence comprises one or more further sequence(s), each further sequence 
5 comprising a canonical linker sequence, preferably GEKP, GERP, GQKP orGQRP, 
optionally comprising one or more amino acid sequences inserted into the canonical 
sequence. The further sequences may be selected from: GGEKP, GGQKP, 
GGSGEKP, GGSGQKP, GGSGGSGEKP, and GGSGGSGQKP. 

We further describe a nucleic acid binding polypeptide produced by any of the 
1 0 methods described above, a nucleic acid encoding a nucleic acid binding polypeptide 
as described above, and a host cell transformed with a nucleic acid as described above. 
We further describe a pharmaceutical composition comprising a polypeptide as 
described above or a nucleic acid as described above together with a pharmaceutical^ 
acceptable carrier. 

1 5 We further describe the use of a structured linker in a method of making a 

nucleic acid binding polypeptide. The structured linker may separate first and second 
nucleic acid binding domains of the nucleic acid binding polypeptide, to enable the 
polypeptide to bind a nucleic acid target in which subsites bound by respective 
domains of the polypeptide are separated by one or more nucleic acid residues. 

20 We further describe a nucleic acid binding polypeptide comprising a repressor 

domain and a plurality of nucleic acid binding domains, the nucleic acid binding 
domains being linked by at least one non-canonical linker. The repressor domain may 
be a transcriptional repressor domain selected from the group consisting of: a KRAB- 
A domain, an engrailed domain and-a snag domain. The nucleic acid binding domains 

25 may be linked by at least one structured linker. 

According to a third aspect of the invention, we provide nucleic acid binding 
proteins comprising nucleic acid binding domains linked by structured and flexible 
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linkers in any combination. This aspect of the invention is summarised by the 
following paragraphs: 

We describe a method of producing a modified nucleic acid nucleic acid 
binding, polypeptide, the method comprising the steps of: (a) providing a nucleic acid 

5 binding polypeptide comprising a plurality of nucleic acid binding modules; (b) 
selecting a first binding domain consisting of one or two contiguous nucleic acid 
binding modules; (c) selecting a second binding domain consisting of one or two 
contiguous nucleic acid binding modules; (d) introducing a first linker sequence to link 
the first and second binding domains, the linker sequence comprising five or more 

1 0 amino acid residues; (e) selecting a third binding domain comprising a nucleic acid 
binding module; (f) selecting a fourth binding domain comprising a nucleic acid 
binding module; and (g) introducing a second linker sequence comprising a structured 
linker to link the third and fourth binding domains. 

Preferably, steps (b) to (d) are repeated. More preferably, steps (e) to (g) are 
1 5 repeated. Preferably, the binding affinity and/or specificity of the modified polypeptide 
to a nucleic acid sequence is increased compared to the binding affinity and/or 
specificity of an unmodified polypeptide. 

Preferably, the nucleic acid sequence comprises a sequence which is bound by 
the unmodified polypeptide. More preferably, the nucleic acid sequence comprises a 
20 sequence bound by the unmodified nucleic acid binding polypeptide, into which one or 
more nucleic acid residues has been inserted. Most preferably, the nucleic acid 
residue(s) are inserted between target subsites bound by the first and second binding 
domains of the unmodified polypeptide. The number of inserted nucleic acid residues 
may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 1 1 or more. 

25 We also describe a method of making a nucleic acid binding polypeptide, the 

method comprising the steps of: (a) providing a first binding domain and a second 
binding domain, at least one of the first and second binding domains consisting of one 
or two nucleic acid binding module(s); (b) linking the first and second binding 
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domains with a first linker sequence comprising five or more amino acid residues; (c) 
providing a third binding domain comprising a nucleic acid binding module; (d) 
providing a fourth binding domain comprising a nucleic acid binding module; and (e) 
linking the third and fourth binding domains with a second linker sequence comprising 
5 a structured linker. 

We further describe a nucleic acid binding polypeptide comprising a first 
binding domain consisting of one or two contiguous nucleic acid binding modules and 
a second binding domain consisting of one or two contiguous nucleic acid binding 
modules, the first and second binding domains being linked by a first linker sequence 
10 comprising five or more amino acid residues; a third binding domain comprising a 
nucleic acid binding module and a fourth binding domain comprising a nucleic acid 
binding module, the third and fourth binding domains being linked by a second linker 
sequence comprising a structured linker. 

In the methods and polypeptides described above, the first linker sequence may 
1 5 comprise a flexible linker. Preferably, the nucleic acid binding module is a zinc finger 
of the Cys2-His 2 type. More preferably, the nucleic acid binding module is selected 
from the group consisting of naturally occurring zinc fingers and consensus zinc 
fingers. 

Preferably, each of the first and the second binding domains consists of two 
20 binding modules. More preferably, the first linker sequence comprises between 5 and 8 
amino acid residues. The first linker sequence may be provided by insertion of one or 
more amino acid residues into a canonical linker sequence. Preferably, the canonical 
linker sequence is selected from GEKP, GERP, GQKP and GQRP. More preferably, 
the first linker sequence comprises a sequence selected from: GGEKP, GGQKP, 
25 GGSGEKP, GGSGQKP, GGSGGSGEKP, and GGSGGSGQKP. Most preferably, the 
nucleic acid binding polypeptide comprises a nucleic acid sequence selected from SEQ 
ID Nos: 22, 23, 24, 25, 26 and 27. 
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Preferably, the structured linker comprises an amino acid sequence which is 
not capable of specifically binding nucleic acid. More preferably, the structured linker 
comprises the amino acid sequence of TFIIIA finger IV. Alternatively, or in addition, 
the structured linker is derived from a zinc finger by mutation of one or more of its 
5 base contacting residues to reduce or abolish nucleic acid binding activity of the zinc 
finger. The zinc finger may be finger 2 of wild type Zif268 mutated at positions -1,2, 
3 and 6. Preferably, the third or fourth nucleic acid binding domain is selected from the 
group consisting of: fingers 1 to 3 of TFIIIA, GAC and Zif. 

Preferably, the method or polypeptide as described above is one in which the 
10 nucleic acid binding polypeptide comprises substantially the sequence of TF(l-4)-ZIF 
(SEQ ID NO: 53), GAC-F4-Zif (SEQ ID NO: 54) or Zif-ZnF-GAC (SEQ ID NO: 55). 
The second linker sequence may comprise one or more further sequence(s), each 
further sequence comprising a canonical linker sequence, preferably GEKP, GERP, 
GQKP or GQRP, optionally comprising one or more amino acid sequences inserted 
1 5 into the canonical sequence. The further sequences may be selected from: GGEKP, 
GGQKP, GGSGEKP, GGSGQKP, GGSGGSGEKP, and GGSGGSGQKP. 

We further describe a nucleic acid binding polypeptide produced by a method 
as described above, a nucleic acid encoding a nucleic acid binding polypeptide as 
described above, and a host cell transformed with a nucleic acid as described above. 

20 We further describe a pharmaceutical composition comprising a polypeptide as 

described above, or a nucleic acid as described above, together with a 
pharmaceutical^ acceptable carrier. 

We further describe a nucleic acid binding polypeptide comprising a repressor 
domain .and a plurality of nucleic acid binding domains, the nucleic acid binding 
25 domains being linked by at least one flexible linker and by at least one structured 
linker. 
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We further describe a nucleic acid binding polypeptide in which the repressor 
domain is a transcriptional repressor domain selected from the group consisting of: a 
KRAB-A domain, an engrailed domain and a snag domain. The nucleic acid binding 
domains may be linked by at least one flexible linker, or they may be linked by at least 
5 one structured linker. 

According to a further aspect of the invention, we provide the use of a nucleic 
acid binding domain comprising two zinc finger modules as a basic unit in the 
construction of a nucleic acid binding polypeptide. 

According to a yet further aspect of the invention, we provide a method of 
10 producing a nucleic acid binding polypeptide, the method comprising providing. a first 
and a second nucleic acid binding domain each comprising two zinc finger modules, 
and linking the first and second nucleic acid binding domains with a structured linker 
sequence or a flexible linker sequence. 

According to a yet further aspect of the invention, we provide the use of a 
15 amino acid sequence comprising five or more amino acid residues as a flexible linker 
to join two or more nucleic acid binding domains comprising two zinc finger modules. 
According to a yet further aspect of the invention, we provide the use of an amino acid 
sequence comprising a zinc finger which is not capable of specifically binding nucleic 
acid, as a structured linker to join two or more nucleic acid binding domains 
20 comprising two zinc finger modules. The nucleic acid binding domain is preferably 
selected from a zinc finger polypeptide library, in which each polypeptide in the 
library comprises more than one zinc finger and wherein each polypeptide has been at 
least partially randomised such that the randomisation extends to cover the overlap of a 
single pair of zinc fingers. 

25 According to a yet further aspect of the invention, we provide a method for 

producing nucleic acid binding domains comprising two zinc finger modules for use in 
constructing a nucleic acid binding polypeptide, the method comprising the steps of: 
(a) providing a zinc finger polypeptide library, in which each polypeptide in the library 
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comprises more than one zinc finger and wherein each polypeptide has been at least 
partially randomised such that the randomisation extends to cover the overlap of a 
single pair of zinc fingers; (b) providing a nucleic acid sequence comprising at least 6 
nucleotides; and (c) selecting sequences in the zinc finger library which are capable of 
5 binding to the nucleic acid sequence. Preferably, substantially one and a half zinc 
fingers are randomised in each polypeptide. 

According to a yet further aspect of the invention, we provide a nucleic acid 
binding polypeptide comprising units of zinc finger binding domains linked by flexible 
and/or structured linkers, each zinc finger binding domain comprising two zinc finger 
1 0 modules. 

Brief Description of the Drawings 

Figure 1 is a schematic diagram showing the construction of the 3x2F and ZIF- 
GAC zinc finger constructs described here. Step 1 : PCR using primer pairs A + a, B + 
b, C + c, D + d. Step 2: Overlap PCR; template fill-in and amplification with end 
15 primers A + b, C + d. Step 3: Digestion with Eagl, ligation of resulting products; 
digestion of full-length product with Ndel + Notl, ligation into pCITE vector. 

Figure 2 shows the nucleic acid and amino acid sequence of the ZIF-GAC 
fusion construct (SEQ ID NO: 21), which is made by joining the third finger of wild- 
type ZIF to the first finger of the GAC clone using the peptide LRQKDGERP. 

20 Figure 3 shows the nucleic acid and amino acid sequence of the 3x2F ZGS 

construct (SEQ ED NO: 22). 

Figure 4 shows the nucleic acid and amino acid sequence of the 3x2F ZGL . 
construct (SEQ ID NO: 23). 

Figure 5 shows the nucleic acid and amino acid sequence of the 3x2F ZGXL 
25 construct (SEQ ID NO: 24). 
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Figure 6 shows the nucleic acid and amino acid sequence of the 3x2F ZGSL 
construct (SEQ ID NO: 25). 

Figure 7 shows the nucleic acid and amino acid sequence of the 3x2F ZGLS 
construct (SEQ ID NO: 26). 

5 Figure 8 shows the nucleic acid and amino acid sequence of the 3xlF ZIF 

construct (SEQ ID NO: 27). 

Figure 9 A shows results of gel-shift experiments in which the 2x3F ZIF-GAC 
peptide is tested for binding to either the 9 bp ZIF site alone (target bsA) or the 
contiguous 18bp ZIF-GAC site (target bsC). 

10 Figure 9B shows results of gel-shift experiments in which the 3x2F ZGS and 

3x2F ZGL peptides are tested for binding to target bsC. Serial 5-fold dilutions of 
peptide are indicated by the black triangle (reactions corresponding to left-hand lanes 
have less peptide than right-hand lanes), and binding site concentration is 0.13nM. 

Figure 10- shows results of gel-shift experiments in which 3x2F ZGS and 3x2F 
1 5 ZGL peptides are tested for binding to the non-contiguous target sequence, bsD, Serial 
5-fold dilutions of peptide are indicated by the black triangle (reactions corresponding 
to left-hand lanes have less peptide than right-hand lanes), and binding site 
concentration is 0.1 3nM. 

Figure 1 1 shows results of gel-shift experiments in which 3x2F ZGXL peptide 
20 is tested for binding to the contiguous and non-contiguous target sequences bsC, bsD 
and bsE. Binding of 3x2F ZGS peptide to bsC is also shown for comparison. Serial 5- 
fold dilutions of peptide are indicated by the black triangle (reactions co.rresponding to 
left-hand lanes have less peptide than right-hand lanes), and binding site concentration 
is 0.13nM. 
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Figure 12 shows results of gel-shift experiments in which 3x2F ZGSL peptide 
is tested for binding to the 3x2F ZGXL binding site bsE, the 3x2F ZGL binding site 
bsD and the 3x2F ZGSL binding site bsF. Serial 5-fold dilutions of peptide are 
indicated by the black triangle (reactions corresponding to left-hand lanes have less 
5 peptide than right-hand lanes), and binding site concentration is 0. 1 OnM. 

Figure 13 is a schematic diagram showing the construction of the TFIIIA(F1- 
4)-ZIF zinc finger construct described here. Step 1: PCR using primer pairs A + a and 
B + b on wild type TFIIIA and wild type ZIF templates respectively. Step 2: Overlap 
PCR; template fill-in and amplification with end primers A + b. Step 3 : Digestion with 
1 0 Eagl 9 ligation of resulting products; digestion of full-length product with Ndel + Not\ 7 
ligation into pCITE vector. 

Figure 14 is a schematic diagram showing the construction of the GAC-F4-ZIF 
zinc finger construct described here. Step 1 : PCR using primer pairs C + c and D + d 
on GAC clone and TFIIIA(Fl-4)-ZIF templates respectively. Step 2: Overlap PCR; 
1 5 template fill-in and amplification with end primers C + d. Step 3 : Digestion with Eagl, 
ligation of resulting products; digestion of full-length product with Ndel + Notl, 
ligation into pCITE vector. 

Figure 15 shows the nucleic acid and amino acid sequence of the TF(Fl-4)-ZIF 
fusion construct (SEQ ID NO: 53). 

20 Figure 1 6 shows the nucleic acid and amino acid sequence of the GAC-F4-ZIF 

construct (SEQ ID NO: 54). 

Figure 17 shows the nucleic acid and amino acid sequence of the ZIF-ZnF- 
GAC construct (SEQ ID NO: 55). 

Figure 18 shows results of gel-shift experiments in which the TFIIIA(Fl-4)- 
25 ZIF peptide is tested for binding to the ZIF binding site (target bsA), the full length 
TFIIIA(Fl-3)-ZIF site with 6 base pairs of intervening DNA, and the TF(Fl-3)-ZIF 
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site with 7 base pairs of intervening DNA. Serial 5-fold dilutions of peptide are 
indicated by the black triangle (reactions corresponding to left-hand lanes have less 
peptide than right-hand lanes), and binding site concentration is 0.1 6nM. 

Figure 19 shows results of gel-shift experiments in which the GAC-F4-ZIF 
5 peptide is tested for binding to the ZIP binding site (target bsA), and the full length 
GAC-ZIF site with 8 base pairs of intervening DNA. Serial 5-fold dilutions of peptide 
are indicated by the black triangle (reactions corresponding to left-hand lanes have less 
peptide than right-hand lanes), and binding site concentration is 0.1 OnM. 

Figure 20 shows results of gel-shift experiments in which the GAC-F4-ZIF 
10 peptide is tested for binding to the ZIF binding site (target bsA), and the GAC-ZIF site 
with 9 base pairs of intervening DNA. Serial 5-fold dilutions of peptide are indicated 
by the black triangle (reactions corresponding to left-hand lanes have less peptide than 
right-hand lanes), and binding site concentration is 0.1 6nM. 

Figure 21 shows results of gel-shift experiments in which the ZIF-ZnF-GAC 
1 5 peptide is tested for binding to the 9 base pair ZIF binding site (target bsA), the full 
length 1 8 base pair ZIF-GAC binding site (bsC), and sites with 2, 3, 4 and 5 base pairs 
between the ZIF and GAC-clone binding sites (labelled respectively Z2G 5 Z3G, Z4G 
and Z5G). The nucleotide sequences of Z2G, Z3G, Z4G and Z5G are as follow: Z2G: 
5' GCG GAC GCG gtG CGT GGG CG 3\ Z3G: 5' GCG GAC GCG agt GCG TGG 
20 GCG 3\ Z4G: 5' GCG GAC GCG tagtGC GTG GGC G 3', Z5G: 5' GCG GAC GCG 
eta gtG CGT GGG CG 3\ Serial 5-fold dilutions of peptide are indicated by the black 
triangle (reactions corresponding to left-hand lanes have less peptide than right-hand 
lanes), and binding site concentration is 0.1 OnM. 

Figure 22 shows results of gel-shift experiments in which the 2x3F ZIF-GAC 
25 peptide is tested for binding to the 9 base pair ZIF binding site (target bsA), the 1 8 
base pair ZIF-GAC binding site (bsC) as well as bsl, bs2, bs3 and bs4, which 
comprise the ZIF-GAC bsC sequence, but with the three base subsequence recognised 
by finger 4 of 2x3F ZIF-GAC removed, and 0, 1, 2 or 3 base pairs respectively 
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inserted in its place. The nucleotide sequences of bsl, bs2, bs3 and bs4 are as follow: 
bsl: GCG GAC GCG TGG GCG, bs2: GCG GAC t GCG TGG GCG, bs3: GCG GAC 
tc GCG TGG GCG and bs4: GCG GAC ate GCG TGG GCG. Serial 5-fold dilutions of 
peptide are indicated by the black triangle (reactions corresponding to left-hand lanes 
5 have less peptide than right-hand lanes), and binding site concentration is O.OlnM. 

Figure 23 shows results of gel-shift experiments in which the 3x2F ZGS 
peptide is tested for binding to the 9 base pair ZIF binding site (target bsA), the full 
length 18 base pair ZIF-GAC binding site, and sites bsl, bs2, bs3 and bs4 as indicated 
above for Figure 22. Serial 5-fold dilutions of peptide are indicated by the black 
10 triangle (reactions corresponding to left-hand lanes have less peptide than right-hand 
lanes), and binding site concentration is O.OlnM. 

Figure 24. The general structure of the six-finger arrays used in this study and 
potential regions of non-bound DNA marked with an 'X\ (A) 2x3F peptide with 9 bp 
subsites indicated, (B) 3x2F peptides with 6 bp subsites indicated. 

15 Figure 25. A selection of DNA binding studies by gel-shift assay. The gels are 

designed to give a comparison between the binding affinities of the 2x3 F Zif-GAC and 
3x2F ZGS peptides, and are not necessarily the gels used to quantify binding affinity. 
For example, the amount of 123456 binding site shifted by each peptide is limited by 
protein concentration, rather than Kd. Top: 5-fold dilutions of 2x3F Zif-GAC (from 

20 800 pM-1 .3 pM), against 2 pM binding sites. Bottom: 5-fold dilutions of 3x2F ZGS 
(from 700 pM-l.l pM), against 2 pM binding sites. The proposed binding modes of 
the zinc finger peptides for each binding site is illustrated under each gel image. 

Figure 26. A selection of DNA binding studies by gel-shift assay. (A) 5-fold 
dilutions of TF(l-4)-ZIF (from 5.5 nM-9 pM), against 20 pM ZIF binding site; 2 pM 
25 TF6Z and 2 pM TF7Z. (B) 5-fold dilutions of TF(l-3)-flex-ZIF (from 5 nM-8 pM), 

against 20 pM ZIF and 2 pM TF7Z. (C) 5-fold dilutions of ZIF-ZnF-GAC (from 1 nM- 
1 .6 pM), against 10 pM ZIF; 0.4 pM ZM; 0.4 pM Z4M; 0.4 pM Z6M and 0.4 pM 
Z8M. 
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Figure 27 shows the nucleic acid and amino acid sequence of the 2x3F pepl 1-9 
construct. 

Figure 28 shows the nucleic acid and amino acid sequence of the 3x2F pepl 1-9 
construct. 

5 Detailed Description of the Invention 

The invention relates to modified nucleic acid binding polypeptides and 
methods of producing these. A number of different novel nucleic acid binding 
polypeptides are disclosed. Methods are also disclosed for modifying an existing 
nucleic acid binding polypeptide comprising a plurality of nucleic acid binding 
10 modules. Where the nucleic acid binding polypeptide is provided by modification of 
an existing nucleic acid binding polypeptide, the binding affinity and/or specificity of 
the modified polypeptide to a substrate may be as good as, or better, than the 
corresponding binding affinity and/or specificity of the unmodified or starting nucleic 
acid to the same substrate. 

15 Thus, the methods of our invention allow the production of nucleic acid 

binding polypeptides with higher binding affinity, or higher binding specificity, or 
both. As the term is used here, "specificity" means the ability of a nucleic acid binding 
polypeptide to discriminate between two or more putative nucleic acid targets. The 
higher its specificity, the less tolerant a nucleic acid binding polypeptide is to changes 

20 to the nature of the target, for example, nucleotide insertions, deletions, mutations, 
inversions, modifications (e.g., methylation, addition of a chemical moeity), etc. A 
nucleic acid binding polypeptide with high specificity for a target sequence is more 
discriminatory, and will likely bind to its target with a certain affinity (which may be a 
high affinity), and less likely to bind another target (which may comprise the target 

25 with changes as described above). 

The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of chemistry, molecular biology, microbiology, recombinant 
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DNA and immunology, which are within the capabilities of a person of ordinary skill 
in the art. Such techniques are explained in the literature. See, for example, J. 
Sambrook, E. F. Fritsch, and T. Maniatis, 1989, Molecular Cloning: A Laboratory 
Manual, Second Edition, Books 1-3, Cold Spring Harbor Laboratory Press; Ausubel, 

5 F. M. et al. (1995 and periodic supplements; Current Protocols in Molecular Biology, 
ch. 9, 13, and 16, John Wiley & Sons, New York, N.Y.); B. Roe, J. Crabtree, and A. 
Kahn, 1996, DNA Isolation and Sequencing:- Essential Techniques, John Wiley & 
Sons; J. M Polak and James O'D. McGee, 1990, In Situ Hybridization: Principles and 
Practice; Oxford University Press; M. J. Gait (Editor), 1984, Oligonucleotide 

10 Synthesis: A Practical Approach, Irl Press; and, D. M. J. Lilley and J. E. Dahlberg, 
1992, Methods ofEnzymology: DNA Structure Part A: Synthesis and Physical 
Analysis of DNA Methods in Enzymology, Academic Press. Each of these general 
texts is herein incorporated by reference. 



In a first aspect, we disclose the use of "flexible" linkers to link nucleic acid 
1 5 binding domains consisting of one or two nucleic acid binding modules. Thus, a 
method according to this aspect of our invention involves selecting binding domains 
within the nucleic acid binding polypeptide, each domain consisting of one or two 
nucleic acid binding modules, and linking these by means of a flexible linker sequence 
comprising five or more amino acid residues. Use of such flexible linkers allows the 
20 binding domains to bind to their cognate binding sites in the nucleic acid even when 
these are separated by one or more gaps, for example 2 gaps, of one, two, three or 
more nucleic acid residues. Thus, the peptides according to this aspect of the invention 
are capable of being able to span two short gaps of unbound DNA, while still binding 
with picomolar affinity to their target sites. In a highly preferred embodiment, the 
25 number of nucleic acid binding modules in each of the first and second binding 
domains is two. 



Our invention is also based in part on the surprising discovery that use of linker 
sequences which adopt a specific conformational structure, rather than flexible linkers, 
to link two nucleic acid binding modules or domains results in modified nucleic acid 
30 binding polypeptides having improved binding characteristics. Such modified 
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polypeptides are capable of binding nucleic acid targets comprising one or more 
relatively wide gaps of varying sizes inserted between target subsites. 

In a second aspect, therefore, we disclose the use of "structured" linkers to link 
nucleic acid binding domains comprising at least one nucleic acid binding module. 

5 Thus, a method according to this aspect of our invention involves selecting binding 
domains within the nucleic acid binding polypeptide, each domain comprising one or 
more nucleic acid binding modules, and introducing a linker sequence comprising a 
structured linker to link the binding domains. By the use of such structured linkers, the 
binding domains in the modified nucleic acid binding polyptide are able to bind to 

10 their cognate binding sites in the nucleic acid even when these are separated by gaps of 
five or more nucleic acid residues. 

The terms "flexible linker" and "structured linker" will be described and 
explained in further detail below. 

A nucleic acid binding polypeptide may also be made which comprises a 
1 5 combination of flexible and structured linkers. Therefore, according to a third aspect, a 
method involves selecting first and second binding domains within the nucleic acid 
binding polypeptide, each domain consisting of one or two nucleic acid binding 
modules, and linking these by means of a flexible linker sequence comprising five or 
more amino acid residues. Further binding domains (third and fourth) within the 
20 nucleic acid binding polypeptide are then selected, each domain comprising one or 
more nucleic acid binding modules, and a linker sequence comprising a structured 
linker is introduced to link the third and fourth binding domains. 

By "nucleic acid binding module" we mean a unit of peptide sequence which 
has nucleic acid binding activity. Examples of peptide sequences having nucleic acid 
25 binding activity include zinc fingers, leucine zippers, helix-turn-helix domains, and 
homeodomains. Preferably, the nucleic acid binding polypeptide comprises a zinc 
finger protein, and the nucleic acid binding modules comprise zinc fingers. A zinc 
finger binding motif is a structure well known to those in the art and defined in, for 
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example, Miller et ai, (1985) EMBO J. 4:1609-1614; Berg (1988) PNAS (USA) 
85:99-102; Lee et al. y (1989) Science 245:635-637; see International patent 
applications WO 96/06166 and WO 96/32475, corresponding to USSN 08/422,107, 
incorporated herein by reference. More preferably, the polypeptide is a zinc finger 

5 protein of the Cys2-His2 class. Accordingly, in preferred embodiments, the nucleic 
acid binding polypeptides of our invention are zinc finger proteins which comprise one 
or more structured linkers, or one or more flexible linkers, or a combination of flexible 
and structured linkers. Where the zinc finger comprises only flexible linkers, the 
number of zinc fingers in each binding domain linked by a flexible linker is .preferably 

1 0 two. The zinc finger as a whole will preferably comprise 2 or more zinc fingers, for 
example 2,-3, 4, 5 or 6 zinc fingers. More preferably, the polypeptidexomprises 6 zinc 
finger modules. 

The nucleic acid binding polypeptides according to the invention need not 
consist of a uniform number of modules within each linked domain. Thus, 

1 5 polypeptides which comprise linked domains, in which the number of modules within 
each domain is different from domain to domain, are envisaged. Our invention 
therefore includes a zinc finger polypeptide comprising any combination of single 
finger domains and double finger domains, for example, the polypeptide comprising: 
finger pair - linker - single finger - single finger - finger pair, etc. The nucleic acid 

20 binding polypeptides according to this invention furthermore need not consist of only a 
single type of binding module. For example, hybrid polypeptides comprising more 
than one type of binding module are envisaged. Such hybrids include fusion proteins 
comprising: zinc finger and homeodomain, zinc finger and helix-loop-helix, helix- 
loop-helix and homeodomain, etc. These hybrid polypeptides may be made by 

25 modifications of the methods described in, for example, Pomerantz et al., 1995, 

Science 267, 93-6. Such modifications are regarded as within the skills of the reader. 
Furthermore, the linkages between the binding domains need not be uniform; they may 
comprise flexible linkers, structured linkers, or any combination of the two. 

According to a further aspect of the invention, a zinc finger domain consisting 
30 of two zinc finger modules may be used as a basic unit or building block for the 



WO 01/53480 



PCT/GB01/00202 



22 

construction of multifmger nucleic acid binding polypeptides. The two finger module 
units may be linked by one or more flexible linkers, one or more structured linkers, or 
a combination of the two. The two finger module units may be produced in a number 
of ways, by recombinant DNA techniques, or by selection from suitable libraries. We 

5 disclose the use of polypeptide and nucleic acid libraries, which comprise or encode 
zinc finger polypeptides comprising more than one finger, in which the relevant base 
contacting positions are fully or partially randomised. Weshow how such libraries, in 
particular, libraries encoding substantially one and a half fingers, may be used to select 
zinc finger pairs. We show that such multifmger polypeptides are effective in spanning 

10 one or more gaps in the target nucleic acid sequence. 

Gap Spanning and Selective Binding 

Nucleic acid binding polypeptides according to our invention are capable of 
binding to nucleic acids having a number of gaps between binding subsites, and are 
therefore capable of accommodating more stretches of unbound DNA within target 

15 sequences than those previously known. They therefore allow greater flexibility in the 
choice of potential binding sites. Furthermore, because the nucleic acid binding 
polypeptides of our invention are capable of spanning a number of gaps of varying 
stretches, they allow the targeting of the most favourable base contacts while avoiding 
less favourable nucleotide sequences. By extending the linker sequence between zinc 

20 finger pairs, we show that 3x2F peptides are able to accommodate two regions of 
unbound DNA within their recognition sequence, rather than one, as is the case for 
2x3F peptides. Hence, these constructs also allow more flexibility in the selection of 
DNA target sequences for 'designer' transcription factors. 

Furthermore, the nucleic acid binding polypeptides of our invention show a 
25 high degree of specificity for their cognate target sites, in that the polypeptides are net 
tolerant of deletions in the target sequence. We show that by changing the way in 
which zinc finger arrays are constructed - by linking three 2-finger domains rather 
than two 3 -finger units - far greater selectivity can be achieved through increased 
sensitivity to mutated or closely related sequences. 
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Thus, we have found that it is possible for known zinc finger proteins (for 
example, those comprising canonical linkers and Zif268/NRE as disclosed in 
W099/45132) to bind to a subsequence consisting of a cognate target sequence with a 
target subsite deleted, by one or more of the fingers looping out of the protein-DNA 

5 complex. Thus, for example* we have found that a polypeptide consisting of 6 zinc 
fingers, besides being capable of binding to its cognate 18 base pair target site, is also 
capable of binding to a 15 base pair subsequence consisting of a 3 base pair deletion of 
the cognate 18 base pair target site. Thus, a ZIF-ZnF-GAC construct, having the 
sequences shown in Figure 17, is able to bind to an 18 base pair nucleic acid sequences 

10 consisting of the 9 base pair ZIF recognition sequence linked to the 9 base pair GAC 
recognition sequence. In addition, this zinc finger construct is capable of binding with 
similar affinity to nucleic acid sequences consisting of 15, 16 or 17 base pairs (i.e., 
nucleic acid constructs consisting of ZIF and GAC recognition sites, but with 3, 2 or 1 
residue removed). Furthermore, this zinc finger construct is also capable of binding 

15 with similar affinity to nucleic acid sequences consisting of 19, 20, 21, 22 and 23 base 
pair nucleic acid sequences comprising the ZIF and GAC recognition sites, separated 
by 1 to 5 nucleotide stretches. A selection of results from these experiments is shown 
in Figures 21 and 22 and explained in further detail below in Example J 7. Without 
seeming to be bound by any particular theory, we believe that the versatility of binding 

20 of ZIF-ZnF-GAC to such a wide range of sequences is probably due to the middle ZnF 
finger (structured linker) being capable of looping out of the protein-DNA complex. 

Looping out of such unbound fingers may be a general phenomenon. Thus, 
zinc finger constructs consisting of 2 three finger domains linked by a linker (for 
example, the 2x3F ZIF-GAC construct described below) are capable of binding 

25 nucleic acid sequences consisting of the cognate 1 8 base pair ZIF-GAC site (i.e., bsC) 
but with the corresponding target subsite for finger 4 deleted and replaced by 0, 1, 2, or 
3 residues, with similar affinity to the full-length site. It would appear that the reason 
for this is that looping out of one of the fingers in this construct leaves behind two 
domains still capable of binding nucleic acid (namely a two finger domain and a three 

30 finger domain). The strength of binding of these remaining domains is sufficient to 
allow the entire construct to be bound to the sub-optimal target even with one finger 
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looped out. Reference is made to Figure 22 and Example 21 below. This phenomenon 
allows the polydactyl peptides (based on tandemly arrayed three-finger domains) 
reported in previous studies to bind with relatively high affinity to related DNA sites 
containing various mutationsand deletions. This would effectively mean that these 
5 peptides would not exclusively target the desired sequences within complex genomes. 

On the other hand, the 3x2F nucleic acid binding polypeptides of our invention 
(in other words, three pairs of zinc fingers separated by flexible linkers) are only 
capable of binding these truncated binding sites with greatly reduced affinity, in 
comparison to their full-length targets. Thus, for example, a 3x2F ZGS construct binds 
extremely weakly to a nucleic acid sequence consisting of the cognate 18 base pair 
ZEF-GAC site (i.e., bsC) but with the corresponding target subsite for finger 4 deleted. 
The affinity of a 3x2F ZGS peptide for such a sequence is similar to the affinity to a 9 
base pair ZIF site. Again without seeming to be bound by any particular theory, we 
believe that this is due to the fact that looping out of this finger leaves behind three 
separated domains for binding; the fact that these consist of two fingers, one finger and 
two fingers means that there is insufficient binding affinity for the entire construct to 
bind with high-affinity to the sub-optimal nucleic acid. The nucleic acid binding 
polypeptides of our invention therefore exhibit far greater selectivity through increased 
sensitivity to mutated or closely related sequences. Reference is made to Figure 23 and 
Example 21 below. 

The fact that the constructs according to this aspect of our invention, namely 
constructs in which pairs of zinc fingers are separated by flexible linkers, appear to be 
more particular in the targets they will detectably bind to is an additional factor 
contributing to their specificity. 

25 In summary, within a three-finger unit the sub-optimal binding of an individual 

finger is better compensated for than within a two-finger unit. Therefore, by linking 
pairs of fingers together (with linkers slightly longer than canonical linkers), a more 
effective peptide for gene regulation is generated. In other words, the entire zinc finger 
pair would contribute minimal binding energy to the peptide-DNA complex if one of 
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the fingers has a sub-optimal binding interaction. The design also improves six-finger 
peptide - DNA interactions by allowing the peptide to adjust more regularly to the 
register of the DNA double helix, reducing the strain within the complex, and 
enhancing the binding affinity. Creating six-finger constructs with two or more 
5 extended linker sequences also provides the opportunity to design extended zinc finger 
peptides that are capable of binding to composite targets with two regions of unbound 
DNA. The present invention therefore encompasses the use of two finger modules as a 
basic unit in the design of zinc finger polypeptides. 

Target Site 

1 0 A "target site" is the nucleic acid sequence recognised by a nucleic acid 

binding polypeptide such as a zinc finger protein,. For a zinc finger protein, the length 
of a target site varies with the number of fingers present, and with the number of 
sequence specific bonds formed between the protein and the target site. Typically, a 
two-fingered zinc protein recognises a four to seven base pair target site, a three- 

1 5 fingered zinc finger protein recognises a six to ten base pair target site, and a six 

fingered zinc finger protein recognises two adjacent nine to ten base pair target sites. A 
"subsite" or a "target subsite" is a subsequence of the target site, and corresponds to a 
portion of the target site recognised by a subunit of the nucleic acid binding 
polypeptide, for example, a nucleic acid binding domain or module of the nucleic acid 

20 binding polypeptide. 

Flexible and Structured Linkers 

By "linker sequence" we mean an amino acid sequence that links together two 
nucleic acid binding modules. For example, in a "wild type" zinc finger protein, the 
linker sequence is the amino acid sequence lacking secondary structure which lies 
25 between the last residue of the ct-helix in a zinc finger and the first residue of the 13- 
sheet in the next zinc finger. The linker sequence therefore joins together two zinc 
fingers. Typically, the last amino acid in a zinc finger is a threonine residue, which 
caps the a-helix of the zinc finger, while a tyrosine/phenylalanine or another 
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hydrophobic residue is the first amino acid of the following zinc finger. Accordingly, 
in a "wild type" zinc finger, glycine is the first residue in the linker, and proline is the 
last residue of the linker. Thus, for example, in the Zif268 construct, the linker 
sequence is G(E/Q)(K/R)P. 

5 A "flexible" linker is an amino acid sequence which does not have a fixed 

structure (secondary or tertiary structure) in solution. Such a flexible linker is therefore 
free to adopt a variety of conformations. An example of a flexible linker is the 
canonical linker sequence GERP/GEKP/GQRP/GQKP. Flexible linkers are also 
disclosed in W099/45132 (Kim and Pabo). By "structured linker" we mean an amino 

10 acid sequence which adopts a relatively well-defined conformation when in solution. 
Structured linkers are therefore those which have a particular secondary and/or tertiary 
structure in solution. 

Determination of whether a particular sequence adopts a structure may be done 
in various ways, for example, by sequence analysis to identify residues likely to 
1 5 participate in protein folding, by comparison to amino acid sequences which are 
known to adopt certain conformations (e.g., known alpha-helix, beta-sheet or zinc 
finger sequences), by NMR spectroscopy, by X-ray diffiaction of crystallised peptide 
containing the sequence, etc as known in the art. 

The structured linkers of our invention preferably do not bind nucleic acid, but 
20 where they do, then such binding is not sequence specific. Binding specificity may be 
assayed for example by gel-shift as described below. 

The linker may comprise any amino acid sequence that does not substantially 
hinder interaction of the nucleic acid binding modules with their respective target 
subsites. Preferred amino acid residues for flexible linker sequences include, but are 
25 not limited to, glycine, alanine, serine, threonine proline, lysine, arginine, glutamine 
and glutamic acid.. 
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The linker sequences between the nucleic acid binding domains preferably 
comprise five or more amino acid residues. The flexible linker sequences according to 
our invention consist of 5 or more residues, preferably, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 
15, 16, 17, 18, 19 or 20 or more residues. In a highly preferred embodiment of the 
5 invention, the flexible linker sequences consist of 5, 7 or 1 0 residues. 

Once the length of the amino acid sequence has been selected, the sequence of 
the linker may be selected, for example by phage display technology (see for example 
United States Patent No. 5,260,203) or using naturally occurring or synthetic linker 
sequences as a scaffold (for example, GQKP and GEKP, see Liu et al., 1997, Proc. 

10 Natl Acad. Set USA 94, 5525-5530 and Whitlow et al., 1991, Methods: A Companion 
to Methods in Enzymology 2: 97-105). The linker sequence may be provided by 
insertion of one or more amino acid residues into an existing linker sequence of the 
nucleic acid binding polypeptide. The inserted residues may include- glycine and/or 
serine residues. Preferably, the existing linker sequence is a canonical linker sequence 

1 5 selected from GEKP, GERP, GQKP and GQRP. More preferably, each of the linker 
sequences comprises a sequence selected from GGEKP, GGQKP, GGSGEKP, 
GGSGQKP, GGSGGSGEKP, and GGSGGSGQKP. 

Structured linker sequences are typically of a size sufficient to confer 
secondary or tertiary structure to the linker; such linkers may be up to 30, 40 or 50 

20 amino acids long. In a preferred embodiment, the structured linkers are derived from 
known zinc fingers which do not bind nucleic acid, or are not capable of binding 
nucleic acid specifically. An example of a structured linker of the first type is TFIIIA 
finger IV; the crystal structure of TFIIIA has been solved, and this shows that finger 
IV does not contact the nucleic acid (Nolte et al, 1998, Proa Natl Acad. ScL USA 95, 

25 2938-2943.). An example of the latter type of structured linker is a zinc finger which 
has been mutagenised at one or more of its base contacting residues to abolish its 
specific nucleic acid binding capability. Thus, for example, a ZIF finger 2 which has 
residues -1, 2, 3 and 6 of the recognition helix mutated to serines so that it no longer 
specifically binds DNA may be used as a structured linker to link two nucleic acid 

30 binding domains. 
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The use of structured or rigid linkers to jump the minor groove of DNA is 
likely to be especially beneficial in (i) linking zinc fingers that bind to widely 
separated (>3bp) DNA sequences, and (ii) also in minimising the loss of binding 
energy due to entropic factors. 

5 Typically, the linkers are made using recombinant nucleic acids encoding the 

linker and the nucleic acid binding modules, which are fused via the linker amino acid 
sequence. The linkers may also be made using peptide synthesis and then linked to the 
nucleic acid binding modules. Methods of manipulating nucleic acids and peptide 
synthesis methods are known in the art (see, for example, Maniatis, et ah, 1991. 

1 0 Molecular Cloning: A Laboratory Manual Cold Spring Harbor, New York, Cold 
Spring Harbor Laboratory Press). 

Nucleic Acid Binding Polypeptides 

This invention relates to nucleic acid binding polypeptides. The term 
"polypeptide" (and the terms "peptide" and "protein") are used interchangeably to 

1 5 refer to a polymer of amino acid residues, preferably including naturally occurring 
amino acid residues. Artificial analogues of amino acids may also be used in the 
nucleic acid binding polypeptides, to impart the proteins with desired properties or for 
other reasons. The term "amino acid", particularly in the context where "any amino 
acid" is referred to, means any sort of natural or artificial amino acid or amino acid 

20 analogue that may be employed in protein construction according to methods known in 
the art. Moreover, any specific amino acid referred to herein may be replaced by a 
functional analogue thereof, particularly an artificial functional analogue. Polypeptides 
may be modified, for example by the addition of carbohydrate residues to form 
glycoproteins. 

25 As used herein, "nucleic acid" includes both RNA and DNA, constructed from 

natural nucleic acid bases or synthetic bases, or mixtures thereof. Preferably, however, 
the binding polypeptides of the invention are DNA binding polypeptides. 
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Particularly preferred examples of nucleic acid binding polypeptides are 
Cys2-His2 zinc finger binding proteins which, as is well known in the art, bind to 
target nucleic acid sequences via a-helicai zinc metal atom co-ordinated binding 
motifs known as zinc fingers. Each zinc finger in a zinc finger nucleic acid binding 
5 protein is responsible for determining binding to a nucleic acid triplet, or an 

overlapping quadruplet, in a nucleic acid binding sequence. Preferably, there are 2 or 
more zinc fingers, for example 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18 or 
more zinc fingers, in each binding protein. Advantageously, the number of zinc fingers 
in each zinc fmger binding protein is a multiple of 2. 

1 0 Thus, in one embodiment, the invention provides a method for preparing a 

nucleic acid binding polypeptide of the Cys2-His2 zinc finger class capable of binding 
to a target DNA sequence, in which zinc finger domains comprising one or two, 
preferably two, zinc finger modules are linked by flexible linkers or structured linkers. 

All of the DNA binding residue positions of zinc fingers, as referred to herein, 
1 5 are numbered from the first residue in the a-helix of the finger, ranging from +1 to +9. 

refers to the residue in the framework structure immediately preceding the a-helix 
in a Cys2-His2 zinc finger, polypeptide. Residues referred to as "+•+" are residues 
present in an adjacent (C-terminal) fmger. Where there is no C-terminal adjacent 
finger, "++" interactions do not operate. 

20 The present invention is in one aspect concerned with the production of what 

are essentially artificial DNA binding proteins. In these proteins, artificial analogues of 
amino acids may be used, to impart the proteins with desired properties or for other 
reasons. Thus, the term "amino acid", particularly in the context where "any amino 
acid" is referred to, means any sort of natural or artificial amino acid or amino acid 

25 • analogue that may be employed in protein construction according to methods known in 
the art. Moreover, any specific amino acid referred to herein may be replaced by a 
functional analogue thereof, particularly an artificial functional analogue. The 
nomenclature used herein therefore specifically comprises within its scope functional 
analogues or mimetics of the defined amino acids. 
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The a-helix of a zinc finger binding protein aligns antiparailel to the nucleic 
acid strand, such that the primary nucleic acid sequence is arranged 3* to 5' in order to 
correspond with the N terminal to C-terminal sequence of the zinc finger. Since 
nucleic acid sequences are conventionally written 5' to 3', and amino acid sequences 

5 N-terminus to C -terminus, the result is that when a nucleic acid sequence .and a zinc 
finger protein are aligned according to convention, the primary interaction of the zinc 
finger is with the - strand of the nucleic acid, since it is this strand which is aligned 3' 
to 5'. These conventions are followed in the nomenclature used herein. It should be 
noted, however, that in nature certain fingers, such as finger 4 of the protein GLI, bind 

10 to the + strand of nucleic acid: see Suzuki et al 7 (1994) NAR 22:3397-3405 and 

Pavletich and Pabo, (1993) Science 261:1701-1707. The incorporation of such fingers 
into DNA binding molecules according to the invention is envisaged. 

The present invention, may be integrated with the rules set forth for zinc finger 
polypeptide design in our copending European or PCT patent applications having 

1 5 publication numbers; WO 98/53057, WO 98/53060, WO 98/53058, WO 98/53059, 
describe improved techniques for designing zinc finger polypeptides capable of 
binding desired nucleic acid sequences. In combination with selection procedures, 
such as phage display, set forth for example in WO 96/06166, these techniques enable 
the production of zinc finger polypeptides capable of recognising practically any 

20 desired sequence. 

Thus, in one embodiment, the invention provides a method for preparing a 
nucleic acid binding polypeptide of the Cys2-His2 zinc finger class capable of binding 
to a target DNA sequence, in which zinc finger domains comprising one or two, 
preferably two, zinc finger modules are linked by flexible linkers or structured linkers, 

25 and in which binding to each base of a DNA triplet by an a-helical zinc finger DNA 
binding module in the polypeptide is determined as follows: if the 5' base in the triplet 
is G, then position +6 in the a-helix is Arg and/or position ++2 is Asp; if the 5' base in 
the triplet is A, then position +6 in the a-helix is Gin or Glu and ++2 is not Asp; if the 
5' base in the triplet is T, then position +6 in the a-helix is Ser or Thr and position ++2 

30 is Asp; or position +6 is a hydrophobic amino acid other than Ala; if the 5' base in the 
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triplet is C, then position +6 in the a-helix may be any amino acid, provided that 
position ++2 in the a-helix is not Asp; if the central base in the triplet is G, then 
position +3 in the a-helix is His; if the central base in the triplet is A, then position +3 
in the a-helix is Asn; if the central base in the triplet is T, then position +3 in the 

5 a-helix is Ala, Ser, He, Leu, Thr or Val; provided that if it is Ala, then one of the 
residues at -1 or +6 is a small residue; if the central base in the triplet is 5-meC, then 
position +3 in the a-helix is Ala, Ser, He, Leu, Thr or Val; provided that if it is Ala, 
then one of the residues at -1 or +6 is a small residue; if the 3' base in the triplet is G, 
then position -1 in the a-helix is Arg; if the 3* base in the triplet is A, then position -1 

10 in the a-helix is Gin and position +2 is Ala; if the 3' base in the triplet is T, then 

position -1 in the a-helix is Asn; or position -1 is Gin and position +2 is Ser; if the 3' 
: base in the triplet is C, then position -1 in the a-helix is Asp and Position +1 is Arg; 
where the central residue of a target triplet is C, the use of Asp at position +3 of a zinc 
finger polypeptide allows preferential binding to C over 5-meC. 

1 5 The foregoing represents a set of rules which permits the design of a zinc 

finger binding protein specific for any given target DNA sequence. 

A zinc finger binding motif is a structure well known to those in the art and 
defined in, for example, Miller etaL, (1985) EMBO J. 4:1609-1614; Berg (1988) 
PNAS (USA) 85:99-102; Lee et al. 9 (1989) Science 245:635-637; see International 
20 patent applications WO 96/06166 and WO 96/32475, corresponding to USSN 
08/422,107, incorporated herein by reference. 

s 

In general, a preferred zinc finger framework has the structure: 

(A) Xo-2 C Xj_5 C X9_i4 H H /c 

where X is any amino acid, and the numbers in subscript indicate the possible 
25 numbers of residues represented by X. 

In a preferred aspect of the present invention, zinc finger nucleic acid binding 
motifs may be represented as motifs having the following primary structure: 
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(B) X a C X 2 _ 4 C X 2 . 3 FX°XXXXLXXHXXX b H- linker 

-1 123456789 

wherein X (including X a , X b and X 0 ) is any amino acid. X 2 ^ and X 2 -3 refer to 
the presence of 2 or 4, or 2 or 3, amino acids, respectively. The Cys and His residues, 
5 which together co-ordinate the zinc metal atom, are marked in bold text and are 
usually invariant, as is the Leu residue at position +4 in the a-helix. The linker, as 
noted elsewhere, may comprise a flexible or a structured linker. 

Modifications to this representation may occur or be effected without 
necessarily abolishing zinc finger function, by insertion, mutation or deletion of amino 

1 0 acids. For example it is known that the second His residue may be replaced by Cys . 
(Krizek et aL, (1991) J. Am. Chem. Soc. 1 13:4518-4523) and that Leu at +4 can in 
some circumstances be replaced with Arg. The Phe residue before X c may be replaced 
by any aromatic other than Trp. Moreover, experiments have shown that departure, 
from the preferred structure and residue assignments for the zinc finger are tolerated 

15 and may even prove beneficial in binding to certain nucleic acid sequences. Even 
taking this into account, however, the general structure involving an a-helix 
co-ordinated by a zinc atom which contacts four Cys or His residues, does not alter. As 
used herein, structures (A) and (B) above are taken as an exemplary structure 
representing all zinc finger structures of the Cys2-His2 type. 

20 Preferably, X a is %-X or P- F /y-X. In this context, X is any amino acid. 

Preferably, in this context X is E, K, T or S. Less preferred but also envisaged are Q, 
V, A and P. The remaining amino acids remain possible. 

Preferably, X 1A consists of two amino acids rather than four. The first of these 
amino acids may be any amino acid, but S, E, K, T, P and R are preferred. 
25 Advantageously, it is P or R. The second of these amino acids is preferably E, 
although any amino acid may be used. 
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Preferably, X b is T or L Preferably, X° is S or T. 

Preferably, X 2 - 3 is G-K-A, G-K-C, G-K-S or G-K-G. However, departures from 
the preferred residues are possible, for example in the form of M-R-N or M-R. 

As set out above, the major binding interactions occur with amino acids -1, +3 
5 and +6. Amino acids +4 and +7 are largely invariant. The remaining amino acids may 
be essentially any amino acids. Preferably, position +9 is occupied by Arg or Lys. 
Advantageously, positions +1, +5 and +8 are not hydrophobic amino acids, that is to 
say are not Phe, Trp or Tyr. Preferably, position ++2 is any amino acid, and preferably 
serine, save where its nature is dictated by its role as a ++2 amino acid for an 
10 N-terminal zinc finger in the same nucleic acid binding molecule. 

In a most preferred aspect, therefore, bringing together the above, the invention 
allows the definition of every residue in a zinc finger DNA binding motif which will 
bind specifically to a given target DNA triplet 

The code provided by the present invention is not entirely rigid; certain choices 
1 5 are provided. For example, positions + 1 , +5 and +8 may have any amino acid 

allocation, whilst other positions may have certain options: for example, the present 
rules provide that, for binding to a central T residue, any one of Ala, Ser or Vai may be 
used at +3. In its broadest sense, therefore, the present invention provides a very large 
number of proteins which are capable of binding to every defined target DNA triplet. 

20 Preferably, however, the number of possibilities may be significantly reduced. 

For example, the non-critical residues +1, +5 and +8 may be occupied by the residues 
Lys, Thr and Gin respectively as a default option. In the case of the other choices, for 
example, the first-given option may be employed as a default. Thus, the code 
according to the present invention allows the design of a single, defined polypeptide (a 

25 "default" polypeptide) which will bind to its target triplet. 
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In a further aspect of the present invention, there is provided a method for 
preparing a DNA binding protein of the Cys2-His2 zinc finger class capable of binding 
to a target DNA sequence, comprising the steps of: a) selecting a model zinc finger 
from the group consisting of naturally occurring zinc fingers and consensus zinc 
5 fingers; b) mutating at least one of positions -1, +3, +6 (and +4-2) of the finger; and c) 
inserting one or more flexible or structured linkers between zinc finger domains 
comprising one or two zinc finger modules. 



In general, naturally occurring zinc fingers may be selected from those fingers 
for which the DNA binding specificity is known. For example, these may be the 

10 ' fingers for which a crystal structure has been resolved: namely Zif 268 

(Elrod-Erickson et al y (1996) Structure 4:1 171-1180), GLI (Pavletich and Pabo, 
(1993) Science 261:1701-1707), Tramtrack (Fairall etai, (1993) Nature 366:483-487) 
and YY1 (Houbaviy etal., (1996) PNAS (USA) 93:13577-13582). Preferably, the 
modified nucleic acid binding polypeptide is derived from Zif 268, GAC, or a Zif- 

1 5 GAC fusion comprising three fingers from Zif linked to three fingers from GAC. By 
"GAC-clone", we mean a three-finger variant of ZIF268 which is capable of binding 
the sequence GCGGACGCG, as described in Choo & Klug (1994), Proa Natl Acad 
Sci. USA, 91, 11163-11167. 



The naturally occurring zinc finger 2 in Zif 268 makes an excellent starting 
20 point from which to engineer a zinc finger and is preferred. 

Consensus zinc finger structures may be prepared by comparing the sequences 
of known zinc fingers, irrespective of whether their binding domain is known. 
Preferably, the consensus structure is selected from the group consisting of the 
consensus structure PYKCPECGKSFSQKSDLVKHQRTHT, and the 
25 consensus structureP YKC SEC GKAFSQKSNLTRH QRIHT. 

The consensuses are derived from the consensus provided by Krizek et al , 
(1991) J. Am. Chem. Soc. 113: 4518-4523 and from Jacobs, (1993) PhD thesis, 
University of Cambridge, UK. In both cases, the linker sequences described above for 
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joining two zinc finger domains together, namely structured or flexible linkers, can be 
formed on the ends of the consensus. 

When the nucleic acid specificity of the model finger selected is known, the 
mutation of the finger in order to modify its specificity to bind to the target DNA may 
5 be directed to residues known to affect binding to bases at which the natural and 

desired targets differ. Otherwise, mutation of the model fingers should be concentrated 
upon residues -1 , +3, +6 and ++2 as provided for in the foregoing rules. 

In order to produce a binding protein having improved binding, moreover, the 
rules provided by the present invention may be supplemented by physical or virtual 
1 0 modelling of the protein/DNA interface in order to assist in residue selection. 

In a further embodiment, the invention provides a method for producing a zinc 
finger polypeptide capable of binding to a target DNA sequence, the method 
comprising: a) providing a nucleic acid library encoding a repertoire of zinc finger 
domains or modules, the nucleic acid members of the library being at least partially 

1 5 randomised at one or more of the positions encoding residues - 1 , 2, 3 and 6 of the 
a-helix of the zinc finger modules; b) displaying the library in a selection system and 
screening it against a target DNA sequence; c) isolating the nucleic acid members of 
the library encoding zinc finger modules or domains capable of binding to the target 
sequence; and d) linking zinc finger domains comprising one or two zinc finger 

20 modules with flexible or structured linkers. 

Methods for the production of libraries encoding randomised polypeptides are 
known in the art and may be applied in the present invention. Randomisation may be 
total, or partial; in the case of partial randomisation, the selected codons preferably 
encode options for amino acids as set forth in the rules above. 

25 Zinc finger polypeptides may be designed which specifically bind to nucleic 

acids incorporating the base U, in preference to the equivalent base T. 
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In a further preferred aspect, the invention comprises a method for producing a 
zinc finger polypeptide capable of binding to a target DNA sequence, the method 
comprising: a) providing a nucleic acid library encoding a repertoire of zinc finger 
polypeptides each possessing more than one zinc finger, the nucleic acid members of 

5 the library being at least partially randomised at one or more of the. positions encoding 
residues -1, 2, 3 and 6 of the a-helix in a first zinc finger and at one or more of the 
positions encoding residues -1, 2, 3 and 6 of the a-helix in a further zinc finger of the 
zinc finger polypeptides; b) displaying the library in a selection system and screening 
it against a target DNA sequence; d) isolating the nucleic acid members of the library 

1 0 encoding zinc finger polypeptides capable of binding to the target sequence; and e) 
linking the isolated nucleic acid members with sequences encoding flexible or 
structured linkers. 

In this aspect, the invention encompasses library technology described in our 
copending International patent application WO 98/53057, incorporated herein by 
1 5 reference in its entirety. WO 98/53057 describes the production of zinc finger 

polypeptide libraries in which each individual zinc finger polypeptide comprises more 
than one, for example two or three, zinc fingers; and wherein within each polypeptide 
partial randomisation occurs in at least two zinc fingers. 

This allows for the selection of the "overlap" specificity, wherein, within each 
20 triplet, the choice of residue for binding to the third nucleotide (read 3' to 5' on the + 
strand) is influenced by the residue present at position +2 on the subsequent zinc 
finger, which displays cross-strand specificity in binding. The selection of zinc finger 
polypeptides incorporating cross-strand specificity of adjacent zinc fingers enables the 
selection of nucleic acid binding proteins more quickly, and/or with a higher degree of 
25 specificity than is otherwise possible. 

Zinc finger binding motifs designed according to the invention may be 
combined into nucleic acid binding polypeptide molecules having a multiplicity of 
zinc fingers. Preferably, the proteins have at least two zinc fingers. The presence of at 
least three zinc fingers is preferred. Nucleic acid binding proteins may be constructed 
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by joining the required fingers end to end, N-terminus to C-terminus, with flexible or 
structured linkers. Preferably, this is effected by joining together the relevant nucleic 
acid sequences which encode the zinc fingers to produce a composite nucleic acid 
coding sequence encoding the entire binding protein. 

5 The invention therefore provides a method for producing a DNA binding 

protein as defined above, wherein the DNA binding protein is constructed by 
recombinant DNA technology, the method comprising the steps of: preparing a nucleic 
acid coding sequence encoding a plurality of zinc finger domains or modules defined 
above, inserting the nucleic acid sequence into a suitable expression vector; and 

1 0 expressing the nucleic acid sequence in a host organism in order to obtain the DNA 
binding protein. A "leader" peptide may be added to the N-terminal finger. Preferably, 
the leader peptide is MAEEKP. This aspect of the invention is described in further 
detail below. 

Transcriptional Regulation 

1 5 According to a further aspect of our invention, we provide a nucleic acid 

binding polypeptide comprising a repressor domain and a plurality of nucleic acid 
binding domains, the nucleic acid binding domains being linked by at least one non- 
canonical linker. The repressor domain is preferably a transcriptional repressor domain 
selected from the group consisting of: a KRAB-A domain, an engrailed domain and a 

20 snag domain. Such a nucleic acid binding polypeptide may comprise nucleic acid 

binding domains linked by at least one flexible linker, one or more domains linked by 
at least one structured linker, or both. 

The nucleic acid binding polypeptides according to our invention may be 
linked to one or more transcriptional effector domains, such as an activation domain or 
25 a repressor domain. Examples of transcriptional activation domains include the VP 1 6 
and VP64 transactivation domains of Herpes Simplex Virus. Alternative 
transact ivation domains are various and include the maize CI transactivation domain 
sequence (Sainz et al> 1997, Mol. Cell. Biol. 17: 1 15-22) and PI (Goff et al y 1992, 
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Genes Dev. 6: 864-75; Estruch et ai, 1994, Nucleic Acids Res. 22: 3983-89) and a 
number of other domains that have been reported from plants (see Estruch et aL, 1994, 
ibid). 

Instead of incorporating a transactivator of gene expression, a repressor of gene 
expression can be fused to the nucleic acid binding polypeptide and used to down 
regulate the expression of a gene contiguous or incorporating the nucleic acid binding 
polypeptide target sequence. Such repressors are known in the art and include, for 
example, the KRAB-A domain (Moosmann et aL, Biol. Chem. 378: 669-677 (1997)) 
the engrailed domain (Han et al 7 Embo J. 12: 2723-2733 (1993)) and the snag domain 
(Grimes et aL, Mol Cell. Biol. 1 6: 6263-6272 (1 996)). These can be used alone or in 
combination to down-regulate gene expression. 

It is known that zinc finger proteins may be fused to transcriptional repression 
domains such as the Kruppel-associated box (KRAB) domain to form powerful 
repressors. These fusions are known to repress expression of a reporter gene even 
1 5 when bound to sites a few kilobase pairs upstream from the promoter of the gene 
(Margolin et al., 1994, PNAS USA 91, 4509-4513). However, because of this, zinc 
finger-KRAB fusion proteins are likely to affect the expression of many genes other 
than the intended target gene. Thus, the feature of KRAB that it is capable of acting to 
repress transcription at a distance is likely to limit its usefulness in gene therapy. 
20 However, as zinc fingers of our invention are capable of spanning gaps and may 
therefore be engineered to bind specifically to promoter sequences, fusion proteins 
comprising KRAB together with zinc fingers of our invention are likely to be effective 
in repressing transciption in a specific manner. This could be achieved by designing 
zinc fingers to bind to specific promoter sequences, and making use of structured 
25 and/or flexible linkers to span non-optimal binding sequences where these are present. 
Fusion proteins comprising KRAB and these engineered finger proteins can then be 
made by methods known in the art, and used to specifically repress transcription. 



5 
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Nucleic Acids encoding Nucleic Acid Binding Polypeptides 

The nucleic acid binding polypeptides may be constructed using recombinant 
techniques as known in the art (Maniatis, et al., 1991. Molecular Cloning: A 
Laboratory Manual. Cold Spring Harbor, New York. Cold Spring Harbor Laboratory 
5 Press). Linker sequences may be introduced between the binding domains by 

restriction enzyme digestion and ligation. For example, zinc finger proteins may be 
constructed by joining together the relevant nucleic acid coding sequences encoding 
the zinc fingers to produce a composite coding sequence with the appropriate linkers. 
Alternatively and preferably, the nucleic acid binding polypeptides are modified by 
.10 mutagenesis at the existing linker sequences, for example by PCR using mutagenic 
oligonucleotides. As described in further detail in the Examples, overlap. PCR may_be 
used to create chimeric zinc finger proteins having modified linker sequences. 

The nucleic acid encoding the nucleic acid binding polypeptide according to 
the invention can be incorporated into vectors for further manipulation. As used 

1 5 herein, vector (or plasmid) refers to discrete elements that are used to introduce 
heterologous nucleic acid into cells for either expression or replication thereof. 
Selection and use of such vehicles are well within the skill of the person of ordinary 
skill in the art. Many vectors are available, and selection of appropriate vector will 
depend on the intended use of the vector, i.e. whether it is to be used for DNA 

20 amplification or for nucleic acid expression, the size of the DNA to be inserted into the 
vector, and the host cell to be transformed with the vector. Each vector contains 
various components depending on its function (amplification of DNA or expression of 
DNA) and the host cell for which it is compatible. The vector components generally 
include, but are not limited to, one or more of the following: an origin of replication, 

25 one or more marker genes, an enhancer element, a promoter, a transcription 

termination sequence and a signal sequence. An example of an expression vector is 
pCITE-4b (Amersham International PLC). 

Both expression and cloning vectors generally contain nucleic acid sequence 
that enable the vector to replicate in one or more selected host cells. Typically in 
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cloning vectors, this sequence is one that enables the vector to replicate independently 
of the host chromosomal DNA, and includes origins of replication or autonomously 
replicating sequences. Such sequences are well known for a variety of bacteria, yeast 
and viruses. The origin of replication from the plasmid pBR322 is suitable for most 
5 Gram-negative bacteria, the 2\x plasmid origin is suitable for yeast, and various viral 
origins (e.g. SV40, polyoma, adenovirus) are useful for cloning vectors in mammalian 
cells. Generally, the origin of replication component is not needed for mammalian 
expression vectors unless these are used in mammalian cells competent for high level 
DNA replication, such as COS cells. 

10 Most expression vectors are shuttle vectors, i.e. they are capable of replication 

in at least one class of organisms but can be transfected into another class of organisms 
for expression. For example, a vector is cloned in E. coli and then the same vector is 
transfected into yeast or mammalian cells even though it is not capable of replicating 
independently of the host cell chromosome. DNA may also be replicated by insertion 

1 5 into the host genome. However, the recovery of genomic DNA encoding the nucleic 
acid binding polypeptide is more complex than that of exogenously replicated vector 
because restriction enzyme digestion is required to excise nucleic acid binding 
polypeptide DNA. DNA can be amplified by PCR and be directly transfected into the 
host cells without any replication component. 

20 Advantageously, an expression and cloning vector may contain a selection 

gene also referred to as selectable marker. This gene encodes a protein necessary for 
the survival or growth of transformed host cells grown in a selective culture medium. 
Host cells not transformed with the vector containing the selection gene will not 
survive in the culture medium. Typical selection genes encode proteins that confer 

25 resistance to antibiotics and other toxins, e.g. ampicillin, neomycin, methotrexate or 
tetracycline, complement auxotrophic deficiencies, or supply critical nutrients not 
available from complex media. As to a selective gene marker appropriate for yeast, 
any marker gene can be used which facilitates the selection for transformants due to 
the phenotypic expression of the marker gene. Suitable markers for yeast are, for 

30 example, those conferring resistance to antibiotics G41 8, hygromycin or bleomycin, or 
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provide for prototrophy in an auxotrophic yeast mutant, for example the URA3, LEU2, 
LYS2,TRPl,orHIS3 gene. 

Since the replication of vectors is conveniently done in E. coli, an E. coli 
genetic marker and an E. coli origin of replication are advantageously included. These 
5 can be obtained from £. coli piasmids, such as pBR322, Bluescript™ vector or a pUC 
plasmid, e.g. pUC18 or pUC19, which contain both E. coli replication origin and E. 
coli genetic marker conferring resistance to antibiotics, such as ampicillin. 

Suitable selectable markers for mammalian cells are those that enable the 
identification of cells competent to take up nucleic acid binding polypeptide nucleic 

1 0 acid, such as dihydrofolate reductase (DHFR, methotrexate resistance), thymidine 
kinase, or genes conferring resistance to G418 or hygromycin. The mammalian cell 
transformants are placed under selection pressure which only those transformants 
which have taken up and are expressing the marker are uniquely adapted to survive. In 
the case of a DHFR or glutamine synthase (GS) marker, selection pressure can be 

1 5 imposed by culturing the transformants under conditions in which the pressure is 
progressively increased, thereby leading to amplification (at its chromosomal 
integration site) of both the selection gene and the linked DNA that encodes the 
nucleic acid binding polypeptide. Amplification is the process by which genes in 
greater demand for the production of a protein critical for growth, together with 

20 closely associated genes which may encode a desired protein, are reiterated in tandem 
within the chromosomes of recombinant cells. Increased quantities of desired protein 
are usually synthesised from thus amplified DNA. 

Expression and cloning vectors usually contain a promoter that is recognised 
by the host organism and is operably linked to nucleic acid binding polypeptide 
25 encoding nucleic acid. Such a promoter may be inducible or constitutive. The 

promoters are operably linked to DNA encoding the nucleic acid binding polypeptide 
by removing the promoter from the source DNA by restriction en2yme digestion and 
inserting the isolated promoter sequence into the vector. Both the native nucleic acid 
binding polypeptide promoter sequence and niany heterologous promoters may be 
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used to direct amplification and/or expression of nucleic acid binding polypeptide 
encoding DNA. 

Promoters suitable for use with prokaryotic hosts include, for example, the p- 
lactamase and lactose promoter systems, alkaline phosphatase, the tryptophan (Trp) 

5 promoter system and hybrid promoters such as the tac promoter. Their nucleotide 
sequences have been published, thereby enabling the skilled worker operably to ligate 
them to DNA encoding nucleic acid binding polypeptide, using linkers or adapters to 
supply any required restriction sites. Promoters for use in bacterial systems will also 
generally contain a Shine-Delgarno sequence operably linked to the DNA.encoding the 

10 nucleic acid binding polypeptide. . - - 

Preferred expression vectors are bacterial expression vectors which comprise a 
promoter of a bacteriophage such as phage X or T7 which isxapable of functioning in 
the bacteria. In one of the most widely used expression systems, the nucleic acid 
encoding the fusion protein may be transcribed from the vector by T7 RNA 

15 polymerase (Studier et al, Methods in Enzymol. 185; 60-89, 1990). In the E. coli 
BL21(DE3) host strain, used in conjunction with pET vectors, the T7 RNA 
polymerase is produced from the A,-lysogen DE3 in the host bacterium, and its 
expression is under the control of the IPTG inducible lac UV5 promoter. This system 
has been employed successfully for over-production of many proteins. Alternatively 

20 the polymerase gene may be introduced on a lambda phage by infection with an int- 
phage such as the CE6 phage which is commercially available (Novagen, Madison, 
USA), other vectors include vectors containing the lambda PL promoter such as PLEX 
(Invitrogen, NL) , vectors containing the trc promoters such as pTrcHisXpressTm 
(Invitrogen) or pTrc99 (Pharmacia Biotech, SE) or vectors containing the tac promoter 

25 such as pKK223-3 (Pharmacia Biotech) or PMAL (New England Biolabs, MA, USA). 

Moreover, the nucleic acid binding polypeptide gene according to the invention 
preferably includes a secretion sequence in order to facilitate secretion of the 
polypeptide from bacterial hosts, such that it will be produced as a soluble native 
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peptide rather than in an inclusion body. The peptide may be recovered from the 
bacterial periplasmic space, or the culture medium, as appropriate. 

Suitable promoting sequences for use with yeast hosts may be regulated or 
constitutive and are preferably derived from a highly expressed yeast gene, especially 

5 a Saccharornyces cerevisiae gene. Thus, the promoter of the TRP1 gene, the ADHI or 
ADHII gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating 
pheromone genes coding for the a- or oc-factor or a promoter derived from a gene 
encoding a glycolytic enzyme such as the promoter of the enolase, glyceraldehyde-3- 
phosphate dehydrogenase (GAP), 3-phospho glycerate kinase (PGK), hexokinase, 

10 pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- 
phosphoglycerate mutase, pyruvate kinase, triose phosphate isomerase, 
phosphoglucose isomerase or glucokinase genes, or a promoter from the TATA 
binding protein (TBP) gene can be used. Furthermore, it is possible to use hybrid 
promoters comprising upstream activation sequences (UAS) of one yeast gene and 

1 5 downstream promoter elements including a functional TATA box of another yeast 

gene, for example a hybrid promoter including the UAS(s) of the yeast PH05 gene and 
downstream promoter elements including a functional TATA box of the yeast GAP 
gene (PH05-GAP hybrid promoter). A suitable constitutive PH05 promoter is e.g. a 
shortened acid phosphatase PH05 promoter devoid of the upstream regulatory 

20 elements (UAS) such as the PH05 (-173) promoter element starting at nucleotide -173 
and ending at nucleotide -9 of the PH05 gene. 

Nucleic acid binding polypeptide gene transcription from vectors in 
mammalian hosts may be controlled by promoters derived from the genomes of 
viruses such as polyoma virus, adenovirus, fowlpox virus, bovine papilloma virus, 
25 avian sarcoma virus, cytomegalovirus (CMV), a retrovirus and Simian Virus 40 

(SV40), from heterologous mammalian promoters such as the actin promoter or a very 
strong promoter, e.g. a ribosomal protein promoter, ajxd from the promoter normally 
associated with nucleic acid binding polypeptide sequence, provided such promoters 
are compatible with the host cell systems. 
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Transcription of a DNA encoding nucleic acid binding polypeptide by higher 
eukaryotes may be increased by inserting an enhancer sequence into the vector. 
Enhancers are relatively orientation and position independent. Many enhancer 
sequences are known from mammalian genes (e.g. elastase and globin). However, 
5 typically one will employ an enhancer from a eukaryotic cell virus. Examples include 
the SV40 enhancer on the late side of the replication origin (bp 100-270) and the CMV 
early promoter enhancer. The enhancer may be spliced into the vector at a position 5' 
or 3' to nucleic acid binding polypeptide DNA, but is preferably located at a site 5' 
from the promoter. 

1 0 Advantageously, a eukaryotic expression vector encoding a nucleic acid 

binding polypeptide according to the invention may comprise a locus control region 
(LCR). LCRs are capable of directing high-level integration site independent 
expression of transgenes integrated into host cell chromatin, which is of importance 
especially where the nucleic acid binding polypeptide gene is to be expressed in the 

1 5 context of a permanently-transfected eukaryotic cell line in which chromosomal 
integration of the vector has occurred, or in transgenic animals. 

Eukaryotic vectors may also contain sequences necessary for the termination of 
transcription and for stabilising the mRNA. Such sequences are commonly available 
from the 5' and 3' untranslated regions of eukaryotic or viral DNAs or cDNAs. These 
20 regions contain nucleotide segments transcribed as polyadenylated fragments in the 
untranslated portion of the mRNA encoding nucleic acid binding polypeptide. 

An expression vector includes any vector capable of expressing nucltic acid 
binding polypeptide nucleic acids that are operatively linked with regulatory 
sequences, such as promoter regions, that are capable of expression of such DNAs. 
25 Thus, an expression vector refers to a recombinant DNA or RNA construct, such as a 
plasmid, a phage, recombinant virus or other vector, that upon introduction into an 
appropriate host cell, results in expression of the cloned DNA. Appropriate expression 
vectors are well known to those with ordinary skill in the art and include those that are 
replicable in eukaryotic and/or prokaryotic cells and those that remain episomal or 
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those which integrate into the host cell genome. For example, DNAs encoding nucleic 
acid binding polypeptide may be inserted into a vector suitable for expression of 
cDNAs in mammalian cells, e.g. a CMV enhancer-based vector such as pEVRF 
(Matthias, et al., (1989) NAR 17, 6418). 

5 Particularly useful for practising the present invention are expression vectors 

that provide for the transient expression of DNA encoding nucleic acid binding 
polypeptide in mammalian cells. Transient expression usually involves the use of an 
expression vector that is able to replicate efficiently in a host cell, such that the host 
cell accumulates many copies of the expression vector, and, in turn, synthesises high 

10 levels of nucleic acid binding polypeptide. For the purposes of the present invention, 
transient expression systems are useful e.g. for identifying nucleic acid binding 
polypeptide mutants, to identify potential phosphorylation sites, or to characterise 
functional domains of the protein. 

Construction of vectors according to the invention employs conventional 
15 ligation techniques. Isolated plasmids or DNA fragments are cleaved, tailored, and 
religated in the form desired to generate the plasmids required. If desired, analysis to 
confirm correct sequences in the constructed plasmids is performed in a known 
fashion. Suitable methods for constructing expression vectors, preparing in vitro 
transcripts, introducing DNA into host cells, and performing analyses for assessing 
20 nucleic acid binding polypeptide expression and function are known to those skilled in 
the art. Gene presence, amplification and/or expression may be measured in a sample 
directly, for example, by conventional Southern blotting, Northern blotting to 
quantitate the transcription of mRNA, dot blotting (DNA or RNA analysis), or in situ 
hybridisation, using an appropriately labelled probe which may be based on a sequence 
25 provided herein. Those skilled in the art will readily envisage how these methods may 
be modified, if desired. 

In accordance with another embodiment of the present invention, there are 
provided cells containing the above-described nucleic acids. Such host cells such as 
prokaryote, yeast and higher eukaryote cells may be used for replicating DNA and 
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producing the nucleic acid binding polypeptide. Suitable prokaryotes include 
eubacteria, such as Gram-negative or Gram-positive organisms, such as E. coli, e.g. E. 
coli K-12 strains, DH5a and HB101, or Bacilli. Further hosts suitable for the nucleic 
acid binding polypeptide encoding vectors include eukaryotic microbes such as 

5 filamentous fungi or yeast, e.g. Saccharomyces cerevisiae. Higher eukaryotic cells 
include insect and vertebrate cells, particularly mammalian cells including human cells 
or nucleated cells from other multicellular organisms. Propagation of vertebrate cells 
in culture (tissue culture) is a routine procedure and tissue culture techniques are 
known in the art. Examples of useful mammalian host cell lines are epithelial or 

10 fibroblastic cell lines such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells. 

HeLa cells or 293T cells. The hostcells referred to in this disclosure comprise cells in 
in vitro culture as well as cells that are within a host animal. 



DNA may be stably incorporated into . cells or may be transiently expressed 
using methods known in the art. Stably transfected mammalian cells may be prepared 

15 by transfecting cells with an expression vector having a selectable marker gene, and 
growing the transfected cells under conditions selective for cells expressing the marker 
gene. To prepare transient transfectants, mammalian cells are transfected with a 
reporter gene to monitor transfection efficiency. To produce such stably or transiently 
transfected cells, the cells should be transfected with a sufficient amount of the nucleic 

20 acid binding polypeptide-encoding nucleic acid to form the nucleic acid binding 
polypeptide. The precise amounts of DNA encoding the nucleic acid binding 
polypeptide may be empirically determined and optimised for a particular cell and 
assay. 



Host cells are transfected or, preferably, transformed with the expression or 
25 cloning vectors of this invention and cultured in conventional nutrient media modified 
as appropriate for inducing promoters, selecting transformants, or amplifying the genes 
encoding the desired sequences. Heterologous DNA may be introduced into host cells 
by any method known in the art, such as transfection with a vector encoding a 
heterologous DNA by the calcium phosphate coprecipitation technique or by 
30 electroporation. Numerous methods of transfection are known to the skilled worker in 



WO 01/53480 



PCT/GB01/00202 



47 

the field. Successful transfection is generally recognised when any indication of the 
operation of this vector occurs in the host cell. Transformation is achieved using 
standard techniques appropriate to the particular host cells used. 

Incorporation of cloned DNA into a suitable expression vector, transfection of 
5 eukaryotic cells with a plasmid vector or a combination of plasmid vectors, each 
encoding one or more distinct genes or with linear DNA, and selection of transfected 
cells are well known in the art (see, e.g. Sambrook et al., 1989 Molecular Cloning: A 
Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press). 

Transfected or transformed cells are cultured using media and cuituring 
1 0 methods known in the art, preferably under conditions, whereby the nucleic acid 
binding polypeptide encoded by the DNA is expressed. The composition of suitable 
media is known to those in the art, so that they can be readily prepared. Suitable 
cuituring media are also commercially available. 

The binding affinity of the nucleic acid binding polypeptides according to our 
1 5 invention may be improved by randomising the polypeptides and selecting for 
improved binding. Methods for randomisation are disclosed in, for example, 
WO96/06166. Thus, zinc finger molecules designed according to the invention may be 
subjected to limited randomisation and subsequent selection, such as by phage display, 
in order to optimise the binding characteristics of the molecule. 

20 The sequences of zinc finger binding motifs may be randomised at selected 

sites and the randomised molecules obtained may be screened and selected for 
molecules having the most advantageous properties. Generally, those molecules 
showing higher affinity and/or specificity of the target nucleic acid sequence are 
selected. Mutagenesis and screening of target nucleic acid molecules may be achieved 

25 by any suitable means. Preferably, the mutagenesis is performed at the nucleic acid 
level, for example by synthesising novel genes encoding mutant proteins and 
expressing these to obtain a variety of different proteins. Alternatively, existing genes 
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can be themselves mutated, such by site-directed or random mutagenesis, in order to 
obtain the desired mutant genes. 

Instead of, or in addition to, randomisation of the zinc finger sequence, a 
particular amino acid sequence may be chosen on the basis of rules which determine 
5 the optimal sequence for binding to any particular nucleic acid sequence. Such rules 
are disclosed, for example, in our International Application PCT/GB98/01516 
(published as WO98/53060). 

Mutations may be performed by any method known to those of skill in the art. 
Preferred, however, is site-directed mutagenesis of a nucleic acid sequence encoding 

10 the protein of interest. A number of methods for site-directed mutagenesis are known 
in the art, from methods employing single-stranded phage such as Ml 3 to PCR-based 
techniques (see "PCR Protocols: A guide to methods and applications", M.A. Innis, 
D.H. Gelfand, J.J. Sninsky, T J. White (eds.). Academic Press, New York, 1990). The 
commercially available Altered Site II Mutagenesis System (Promega) may be 

1 5 employed, according to the directions given by the manufacturer. 

Screening of the proteins produced by mutant genes is preferably performed by 
expressing the genes and assaying the binding ability of the protein product. A simple 
and advantageously rapid method by which this may be accomplished is by phage 
display, in which the mutant polypeptides are expressed as fusion proteins with the 

20 coat proteins of filamentous bacteriophage, such as the minor coat protein pEL of 
bacteriophage Ml 3 or gene III of bacteriophage Fd, and displayed on the capsid of 
bacteriophage transformed with the mutant genes. The target nucleic acid sequence is 
used as a probe to bind directly to the protein on the phage surface and select the phage 
possessing advantageous mutants, by affinity purification. The phage are then 

25 amplified by passage through a bacterial host, and subjected to further rounds of 

selection and amplification in order to enrich the mutant pool for the desired phage and 
eventually isolate the preferred clone(s). Detailed methodology for phage display is 
known in the art and set forth, for example, in US Patent 5,223,409; Choo and Klug, 
(1995) Current Opinions in Biotechnology 6:431-436; Smith, (1985) Science 
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228:1315-1317; and McCafferty et ai, (1990) Nature 348:552-554; all incorporated 
herein by reference. Vector systems and kits for phage display are available 
commercially, for example from Pharmacia. 



Binding affinity may also be assayed by means of a gel-shift assay, in which 
5 the mobility of a substrate in a gel is reduced in the presence of binding by a 

polypeptide. The nucleic acid substrate is labelled by, for example, 32 P, for the band- 
shift to be easily visualised. 

Uses 



Nucleic acid binding polypeptides according to the invention may be employed 
10 in a wide variety of applications, including diagnostics and as research tools. 
Advantageously, they may be employed as diagnostic tools for identifying the 
presence of nucleic acid molecules in a complex mixture. Nucleic acid binding 
molecules according to the invention may be used to differentiate single base pair 
changes in target nucleic acid molecules. In a preferred embodiment, the nucleic acid 
1 5 binding molecules of the invention can be incorporated into an ELISA assay. For 
example, phage displaying the molecules of the invention can be used to detect the 
presence of the target nucleic acid, and visualised using enzyme-linked anti-phage 
antibodies. 

Further improvements to the use of zinc finger phage for diagnosis can be 
20 made, for example, by co-expressing a marker protein fused to the minor coat protein 
(gVIII) of bacteriophage. Since detection with an anti-phage antibody would then be 
obsolete, the time and cost of each diagnosis would be further reduced. Depending on 
the requirements, suitable markers for display might include the fluorescent proteins ( 
A. B. Cubitt, et al, (1995) Trends Biochem Set 20, 448-455; T. T. Yang, et ai, (1996) 
25 Gene 173, 1 9-23), or an enzyme such as alkaline phosphatase which has been 

previously displayed on gill ( J. McCafferty, R. H. Jackson, D. J. Chiswell, (1991) 
Protein Engineering 4, 955-961) Labelling different types of diagnostic phage with 
distinct markers would allow multiplex screening of a single nucleic acid sample. 
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Nevertheless, even in the absence of such refinements, the basic ELISA technique is 
reliable, fast, simple and particularly inexpensive. Moreover it requires no specialised 
apparatus, nor does it employ hazardous reagents such as radioactive isotopes, making 
it amenable to routine use in the clinic. The major advantage of the protocol is that it 
5 obviates the requirement for gel electrophoresis, and so opens the way to automated 
nucleic acid diagnosis. 

Polypeptides made according to the invention may be employed in the 
manufacture of chimeric restriction enzymes, in which a nucleic acid cleaving domain 
is fused to a nucleic acid binding polypeptide comprising for example a zinc finger as 

1 0 described herein. Moreover, the invention provides therapeutic agents and methods of 
therapy involving use of nucleic acid binding polypeptides as described herein. In 
particular, the invention provides the use of polypeptide fusions comprising an 
integrase, such as a viral integrase, and a nucleic acid binding polypeptides according 
to the invention to target nucleic acid sequences in vivo (Bushman, 1994 PNAS USA 

15 91 :9233-9237). In gene therapy applications, the method may be applied to the 

delivery of functional genes into defective genes, or .the delivery of nonsense nucleic 
acid in order to disrupt undesired nucleic acid. Alternatively, genes may be delivered 
to known, repetitive stretches of nucleic acid, such as centromeres, together with an 
activating sequence such as an LCR. This represents a route to the safe and predictable 

20 incorporation of nucleic acid into the genome. 

In conventional therapeutic applications, nucleic acid binding polypeptides 
according to the invention may be used to specifically knock out cell having mutant 
vital proteins. For example, if cells with mutant ras are targeted, they will be destroyed 
because ras is essential to cellular survival. Alternatively, the action of transcription 

25 factors may be modulated, preferably reduced, by administering to the cell agents 
which bind to the binding site specific for the transcription factor. For example, the 
activity of HIV tat may be reduced by binding proteins specific for HIV TAR. 
Moreover, binding proteins according to the invention may be coupled to toxic 
molecules, such as nucleases, which are capable of causing irreversible nucleic acid 

30 damage and cell death. Such agents are capable of selectively destroying cells which 
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comprise a mutation in their endogenous nucleic acid. Nucleic acid binding 
polypeptides and derivatives thereof as set forth above may also be applied to the 
treatment of infections and the like in the form of organism-specific antibiotic or 
antiviral drugs. . In such applications, the binding proteins may be coupled to a nuclease 
5 or other nuclear toxin and targeted specifically to the nucleic acids of microorganisms. 



Poly-zinc finger peptides, with their ability to bind with high affinity to long 
(>18 bp) DNA target sequences, are likely to be used more and more in the search for 
gene therapy treatments and applications such as transgenic plants / animals. However, 
for such- applications to be effective and safe it is crucial that high affinity zinc finger 

1 0 peptides are also highly specific. This is of particular importance given the extremely 
slow off rates observed for extended zinc finger arrays (Kim, J-S. & Pabo, CO. 
(1998) Proc. Natl Acad. Set USA 95, 2812-2817). The zinc fingers disclosed in this 
document better satisfy both these requirements. We have achieved this by creating a 
design of six-finger peptides, which not only gives a slightly higher affinity than a 

1 5 comparable 2x3F peptide, but more importantly, with far greater specificity for its full- 
length target. The two-finger units employed also allow greater flexibility in the 
selection of target sites by allowing one or two gaps of non-bound DNA, and reduce 
the library size required to select specific binding domains by techniques such as 
phage display. 3x2F peptides will greatly enhance the application of zinc finger arrays 

20 for the in vivo control of gene expression. 



Proteins and polypeptides suitable for treatment using the nucleic acid binding 
proteins of our invention include those involved in diseases such as cardiovascular, 
inflammatory, metabolic, infectious (viral, bacteria, fungul, etc), genetic, neurological, 
rheumatological, dermatological, and musculoskeletal diseases. In particular, the 

25 invention provides nucleic acid binding proteins suitable for the treatment of diseases, 
syndromes and conditions such as hypertrophic cardiomyopathy, bacterial 
endocarditis, agyria, amyotrophic lateral sclerosis, tetralogy of fallot, myocarditis, 
anemia, brachial plexus, neuropathies, hemorrhoids, congenital heart defects, alopecia 
areata, sickle cell anemia, mitral valve prolapse, autonomic nervous system diseases, 

30 alzheimer disease, angina pectoris, rectal diseases, arrhythmogenic right, ventricular — 
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dysplasia, acne rosacea, amblyopia, ankylosing spondylitis, atrial fibrillation, cardiac 
tamponade, acquired immunodeficiency syndrome, amyloidosis, autism, brain 
neoplasms, central nervous system diseases, color vision defects, arteriosclerosis, 
breast diseases, central nervous system infections, colorectal neoplasms, arthritis, 
5 behcet's syndrome, breast neoplasms, cerebral palsy, common cold, asthma, bipolar . 
* disorder, burns, cervix neoplasms, communication disorders, atherosclerosis, 
candidiasis, charcot-marie disease, crohn disease, attention deficit disorder, brain 
injuries, cataract, ulcerative colitis, cumulative trauma disorders, cystic fibrosis, 
developmental disabilities, eating disorders, erysipelas, fibromyalgia, decubitus ulcer, 
1 0 diabetes, emphysema, escherichia coli infections, folliculitis, deglutition disorders, 

diabetic foot, encephalitis, esophageal diseases, food hypersensitivity, dementia, down 
• syndrome, Japanese encephalitis, eye neoplasms, dengue, dyslexia, endometriosis, 
fabry's disease, gastroenteritis, depression, dystonia, chronic fatigue syndrome, 
gastroesophageal reflux, gaucher's disease, hematologic diseases, hirschsprung disease, 
1 5 hydrocephalus, hyperthyroidism, gingivitis, hemophilia, histiocytosis, hyperhidrosis, 
hypoglycemia, glaucoma, hepatitis, hiv infections, hyperoxaluria, hypothyroidism, 
glycogen storage disease, hepatolenticular degeneration, hodgkin disease, 
hypersensitivity, immunologic deficiency syndromes, hernia, holt-oram syndrome, 
hypertension, impotence, congestive heart failure, herpes genitalis, huntington's 
20 disease, pulmonary hypertension, incontinence, infertility, leukemia, systemic lupus 
erythematosus, maduromycosis, mental retardation, inflammation, liver neoplasms, 
lyme disease, malaria, inborn errors of metabolism, inflammatory bowel diseases, long 
qt syndrome, lymphangiomyomatosis, measles, migraine, influenza, low back pain, 
lymphedema, melanoma, mouth abnormalities, obstructive lung diseases, lymphoma, 
25 meningitis, mucopolysaccharidoses, leprosy, lung neoplasms, macular degeneration, 
menopause, multiple sclerosis, muscular dystrophy, myofascial pain syndromes, 
osteoarthritis, pancreatic neoplasms, peptic ulcer, myasthenia gravis, nausea, 
osteoporosis, panic disorder, myeloma, acoustic neuroma, otitis media, paraplegia, 
phenylketonuria, myeloproliferative disorders, nystagmus, ovarian neoplasms, 
30 parkinson disease, pheochromocytoma, myocardial diseases, opportunistic infections, 
pain, pars planitis, phobic disorders, myocardial infarction, hereditary optic atrophy, 
pancreatic diseases, pediculosis, plague, poison ivy dermatitis, prion diseases, reflex 
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sympathetic dystrophy, schizophrenia, shyness, poliomyelitis, prostatic diseases, 
respiratory tract diseases, scleroderma, Sjogren's syndrome, polymyalgia rheumatica, 
prostatic neoplasms, restless legs, scoliosis, skin diseases, postpoliomyelitis syndrome, 
psoriasis, retinal diseases, scurvy, skin neoplasms, precancerous conditions, rabies, 

5 retinoblastoma, sex disorders, sleep disorders, pregnancy > sarcoidosis, sexually 

transmitted diseases, spasmodic torticollis, spinal cord injuries, testicular neoplasms, 
trichotillomania, urinary tract, infections, spinal dystaphism, substance-related 
disorders, thalassemia, trigeminal neuralgia, urogenital diseases, spinocerebellar 
degeneration, sudden infant death, thrombosis, tuberculosis, vascular diseases, 

10 strabismus, tinnitus, tuberous sclerosis, post-traumatic stress disorders, syringomyelia, 
tourette syndrome, turner's syndrome, vision disorders, psychological stress, 
temporomandibular joint dysfunction syndrome, trachoma, urinary incontinence, von 
willebrand's disease, renal osteodystrophy, bacterial infections, digestive system 
neoplasms, bone neoplasms, vulvar diseases, ectopic pregnancy, tick-borne diseases, 

1 5 marfan syndrome, aging, williams syndrome, angiogenesis factor, urticaria, sepsis, 
malabsorption syndromes, wounds and injuries, cerebrovascular accident, multiple 
chemical sensitivity, dizziness, hydronephrosis, yellow fever, neurogenic arthropathy, 
hepatocellular carcinoma, pleomorphic adenoma, vater f s ampulla, meckel's 
diverticulum, keratoconus skin, warts, sick building syndrome, urologic diseases, 

20 ischemic optic neuropathy, common bile duct calculi, otorhinolaryngologic diseases, 
superior vena cava syndrome, sinusitis, radius fractures, osteitis deformans, 
trophoblastic neoplasms, chondrosarcoma, carotid stenosis, varicose veins, creutzfeldt- 
jakob syndrome, gallbladder diseases, replacement of joint, vitiligo, nose diseases, 
environmental illness, megacolon, pneumonia, vestibular diseases, cryptococcosis, 

25 herpes zoster, fallopian tube neoplasms, infection, arrhythmia, glucose intolerance, 
neuroendocrine tumors, scabies, alcoholic hepatitis, parasitic diseases, salpingitis, 
cryptococcal meningitis, intracranial aneurysm, calculi, pigmented nevus, rectal 
neoplasms, mycoses, hemangioma, colonic neoplasms, hypervitaminosis a, 
nephrocalcinosis, kidney neoplasms, vitamins, carcinoid tumor, celiac disease, 

30 pituitary diseases, brain death, biliary tract diseases, prostatitis, iatrogenic disease, 

gastrointestinal hemorrhage, adenocarcinoma, toxic megacolon, amputees, seborrheic 
keratosis, osteomyelitis, barrett esophagus, hemorrhage, stomach neoplasms, 
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chickenpox, cholecystitis, chondroma bacterial infections and mycoses, parathyroid 
neoplasms, spermatic cord torsion, adenoma, lichen planus, anal gland neoplasms, 
lipoma, tinea pedis, alcoholic liver diseases, neurofibromatoses, lymphatic diseases, 
elder abuse, eczema, diverticulitis, carcinoma, pancreatitis, amebiasis, pyelonephritis, 
5 and infectious mononucleosis, etc. 

Pharmaceutical Compositions 

The invention likewise relates to pharmaceutical preparations which contain 
the compounds according to the invention or pharmaceutically acceptable salts thereof 
as active ingredients, and to processes for their preparation. The pharmaceutical 

1 0 preparations according to the invention which contain the compound according to the 
invention or pharmaceutically acceptable salts thereof are those for enteral, such as 
oral, furthermore rectal, and parenteral administration to for example warm-blooded 
animal(s),the pharmacological active ingredient being present on its own or together 
with a pharmaceutically acceptable carrier. The dose of the active ingredient depends 

15 on the species, age and the individual condition and also on the manner of 

administration. For example, in the normal case, an approximate daily dose of about 
1 0 mg to about 250 mg is to be estimated in the case of oral administration for a 
human patient weighing approximately 75 kg. 

The novel pharmaceutical preparations contain, for example, from about 10 % 
20 to about 80%, preferably from about 20 % to about 60 %, of the active ingredient. 
Pharmaceutical preparations according to the invention for enteral or parenteral 
administration are, for example, those in unit dose forms, such as sugar-coated tablets, 
tablets, capsules or suppositories, and ampoules. These are prepared in a manner 
known in the art, for example by means of conventional mixing, granulating, sugar- 
25 coating, dissolving or lyophilising processes. Thus, pharmaceutical preparations for 
oral use can be obtained by combining the active ingredient with solid carriers, if 
desired granulating a mixture obtained, and processing the mixture or granules, if 
desired or necessary, after addition of suitable excipients to give tablets or sugar- 
coated tablet cores. 
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Suitable carriers are, in particular, fillers, such as sugars, for example lactose, 
sucrose, mannitol or sorbitol, cellulose preparations and/or calcium phosphates, for 
example tricalcium phosphate or calcium hydrogen phosphate, furthermore binders, 
such as starch paste, using, for example, corn, wheat, rice or potato starch, gelatin, 
tragacanth, methylcellulose and/or polyvinylpyrrolidone, if desired, disintegrants, such . 
as the abovementioned starches, furthermore carboxymethyl starch, crosslinked 
polyvinylpyrrolidone, agar, alginic acid or a-salt thereof, such as sodium alginate; 
auxiliaries are primarily glidants, flow-regulators and lubricants, for example silicic 
acid, talc, stearic acid or.salts thereof, such as magnesium or calcium stearate, and/or 
polyethylene glycol. Sugar-coated tablet cores are provided with suitable coatings 
which, if desired, are resistant to gastric juice, using, inter alia, concentrated sugar 
solutions which, if desired, contain gum arabic, talc, polyvinylpyrrolidone, 
polyethylene glycol and/or titanium dioxide, coating solutions in suitable organic 
solvents or solvent mixtures or, for the preparation of gastric juice-resistant coatings, 
solutions of suitable cellulose preparations, such as acetylcellulose phthalate or 
hydroxypropylmethylcellulose phthalate. Colorants or pigments, for example to 
identify or to indicate different doses of active ingredient, may be added to the tablets 
or sugar-coated tablet coatings. 

Other orally utilisable pharmaceutical preparations are hard gelatin capsules, 
20 and also soft closed capsules made of gelatin and a plasticiser, such as glycerol or 
sorbitol. The hard gelatin capsules may contain the active ingredient in the form of 
granules, for example in a mixture with fillers, such as lactose, binders, such as 
starches, and/or lubricants, such as talc or magnesium stearate, and, if desired, 
stabilisers. In soft capsules, the active ingredient is preferably dissolved or suspended 
25 in suitable liquids, such as fatty oils, paraffin oil or liquid polyethylene glycols, it also 
being possible to add stabilisers. 

Suitable rectally utilisable pharmaceutical preparations are, for example, 
suppositories, which consist of a combination of the active ingredient with a 
suppository base. Suitable suppository bases are, for example, natural or synthetic 
30 triglycerides, paraffin hydrocarbons, polyethylene glycols or higher alkanols. 
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Furthermore, gelatin rectal capsules which contain a combination of the active 
ingredient with a base substance may also be used. Suitable base substances are, for 
example, liquid triglycerides, polyethylene glycols or paraffin hydrocarbons. 

Suitable preparations for parenteral administration are primarily aqueous 
5 solutions of an active ingredient in water-soluble form, for example a water-soluble 
salt, and furthermore suspensions of the active ingredient, such as appropriate oily 
injection suspensions, using suitable lipophilic solvents or vehicles, such as fatty oils, 
for example sesame oil, or synthetic fatty acid esters, for example ethyl oleate or 
triglycerides, or aqueous injection suspensions which contain viscosity-increasing m 
10 substances, fo'r example sodium carboxymethylcellulose, sorbitol and/or dextran, and, 
if necessary, also stabilisers. 

Two Finger Module Libraries 

The present invention includes a method of constructing multi-finger zinc 
finger proteins which are based on a construction unit of two fingers. The use of 
15 combinatorial libraries for generating two-zinc finger DNA binding domains is 
disclosed. We further describe a number of linkers that are suitable in constructing 
multifinger proteins and that are especially suitable for use with construction units of 
two fingers. 

According to this aspect of the invention, combinatorial library systems may be 
20 used to generate two-finger construction units. Such libraries take advantage on a 
number of features of the libraries described in published patent applications WO 
98/53057, WO 98/53058, WO 98/53059, and WO 98/53060 which are hereby 
incorporated by reference. In particular, the libraries are constructed in such a way as 
to enable the synergistic interaction between the two fingers which comprise the 
25 selected two-finger construction unit to be utilised. 

We have determined that DNA-binding subunits comprising two-zinc finger 
domains may be engineered through the variety of approaches described herein, each 
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of which has distinct advantages for creating DNA-binding proteins. In each of the 
libraries detailed here, amino acid randomizations are made at various positions in the 
two zinc finger structures. Preferred randomizations are described here as well as in 
patent applications WO 96/06166, WO 98/53057, WO 98/53058, WO 98/53059, and 

5 WO 98/53060. However, a more restricted number of randomizations may be utilized 
in library construction to facilitate the process of construction. The library construction 
methods described herein can be used in conjunction with a variety of selection 
methods including phage display and ribosome display as detailed in patent 
applications WO 97/53057 and WO 00/27878, both of which are incorporated herein 

10 by reference. 

In one approach, an isolated two finger library is constructed, which comprises , 
amino acids known to contribute to DNA-binding affinity and specificity. Since the 
library does not contain a DNA-binding "anchor", the register of the interaction is not 
strictly fixed, so this library may suitably be used for applications where either (i) the 
1 5 precise register of interaction is not critical for subsequent applications, or (ii) very 
short DNA targets [6-7 bp] are used in the selection procedure, thereby fixing the 
interaction more precisely. 

It is highly desirable to engineer 2-finger domains whose register of interaction 
is precisely fixed, and which can be targeted to any DNA sequence. We have shown 

20 that this can be achieved by employing "GCG" anchors (although any other anchor 
sequence can be employed) and two extensively-randomised zinc fingers. The libraries 
are designed to take into account synergistic effects between zinc fingers, by 
modifying cross-strand contacts from position 2. Consequently, position 2 of F2 in is 
modified to Ser or Ala so as to interact universally with either the r C in the "GCG" 

25 anchor, or any base (^N) in the final target site sequence. Similarly, position 2 of F3 is 
modified to Ser or Ala so as not to interfere with the selection of bases 4 *X or 4 X. As 
before, after selecting against particular DNA target sites, the genes for the appropriate 
2-finger domains may be easily recovered by PCR. 
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In a further approach, two previously constructed libraries (Libl2 and Lib23, as 
described in WO 98/53057) are readily adapted to provide a resource of 2-finger 
subunits. These two libraries have been extensively characterised and used for the 
selection of zinc finger modules of 1.5 fingers, each of which is then recombined to 
5 generate a 3-finger module (see WO98/503 57). We now show that these libraries can 
be used to select two finger units that bind DNA sites of the form 5'-GXX XXX-3' or 
5'-XXX XXG-3' (where X is any base). After selecting against particular DNA target 
sites, the genes for the appropriate 2-finger domains-may be easily recovered by PCR. 
Because of the design of the libraries, the "GCGG" or "GGCG" anchors serve to fix 
1 0 the register of DNA-protein interaction very precisely. Despite the fact that one base 
must be fixed as "G" in each target site, this still allows 2048 of all the 4096 (=4 6 ) 
possible 6-base 2-finger recognition sites to be targeted. 



The general principle is demonstrated below. 



15 Library Binding Site (5' -3') 

F3 F2 Fl 

LIB12 GCG GXX XXX 

20 LIB23 XXX XXG GCG 



Therefore, LIB 12 may be used to select a novel 2-fmger unit that binds a 6 bp 
site with a 5' guanine. Similarly, LIB23 can be used to select a novel 2-finger unit that 
binds a 6 bp site with a 3' guanine. 

25 Accordingly, we have recognized that the concept of selection of two-finger 

construction units need not require full randomization of both zinc fingers as libraries 
can be generated which providing for the fixing of one (or more) of the base contacting 
positions and selection against a DNA sequence that incorporates the corresponding 
nucleotide at the pre-determined base contacting position. Libraries may, for example, 

30 be constructed from zinc finger proteins in which two of the nucleotides of either 
target triplet are fixed. Using Zi£268 as the backbone this would, for example, allow 
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selection of two finger modules which target the sequence 5'-GGNNNN-3' or 5 5 - 
NNNNGG-3'. Using other backbone zinc fingers, the fixed nucleotides may be other 
nucleotides. 

• In an extension of this concept, it will be appreciated that Lib 12 and Lib23 can 
5 be used to select 2-finger domains which bind the sequences GCGGXX or XXGGCG 
respectively. 

Further advantages offered by 2-finger domains include the following: (a) the 
2-finger domains are independent so no problems are encountered when fusing 
separately selected units; (b) no further rounds of selection are required after selecting 
individual 2-finger domains; (c) 3x2F peptides are more specific than 2x3F peptides; 

(d) 3x2F peptides allow two 1 bp gaps to be accommodated within the target sequence; 

(e) with minor modifications to the libraries any 6 bp sequence can be targeted in one 
go; (f) complete binding site signatures may be possible for entire 2-fihger units by 
DNA micro-array ELISA. Thus, as indicated in (d) above, 3x2F peptides allow two 1 
bp gaps to be accommodated within the target sequence, indeed 2-finger units bind 
with optimal efficiency when within 1 bp of each other. 

The invention is further described, for the purposes of illustration only, in the 
following examples. 

Examples 

20 Example 1: Constructs, Targets and Nomenclature 

In order to combine the benefits of tight binding to an extended DNA 
sequence, coupled with the flexibility to skip bases in the DNA target site, we 
designed a series of six fingered chimeric zinc finger proteins derived from wild type 
ZIF fused to a GAC-clone. Each construct comprises three pairs of zinc fingers 
25 separated by extended, flexible linker peptides. These are termed "3x2F peptides". 



10 



15 
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One such flexible linker construct comprises the fingers of the wt ZIF and 
GAC with zinc finger pairs separated by -GG%KP- and is termed 3x2F ZGS (Figure 
3). This peptide targets the contiguous DNA binding sequence, bsC (Table 1), which 
comprises the wt ZIF and GAC-clone binding sites. To allow some variation in the 

5 binding- sites targeted by the 3x2F protein, finger pairs are also separated by - 

GGSG%KP-, or -GGSGGSG E / Q KP- linker sequences to create the 3x2F ZGL and 
3x2F ZGXL constructs respectively (Figures 4 and 5). These peptides are targeted 
against the contiguous ZIF-GAC binding site (bsC), and against the binding sites bsD 
and bsE (Table 1), which contain 1 or 2 bps, respectively, between the recognition 

1 0 sequences for the zinc finger pairs. Similar constructs are also synthesised in which 
two-finger units are separated by linkers containing either glycine or Gly-Gly-Ser 
insertions. These constructs are termed 3x2F ZGSL and 3x2F ZGLS (Figures 6 and 7) 
and are targeted against the appropriate binding sites, bsF and bsG (Table 1). 

Constructs are also made comprising structured linkers. One such construct 
1 5 comprises the first four fingers of TFIIIA (including the F4-F5 linker peptide) joined 
to the N-terminus of the three-finger ZIF peptide. The resultant seven-finger peptide is 
denoted TF(Fl-4)-ZIF (Example 15 and Figures 13 and 15), and is targeted to non- 
contiguous binding sites containing the TFIIIA Fl-3 and wt ZIF sites separated by 5 to 
10 bps of DNA (Table 2). The second construct is created by substituting the first three 
20 fingers of TFIIIA in the above fusion peptide with the three-finger GAC-clone, and is 
denoted GAC-F4-ZIF (Example 16 and Figures 14 and 16). This peptide is targeted 
against the non-contiguous binding sites (Table 3), which comprise the GAC-clone 
and wt ZIF recognition sites separated by 6 to 1 1 bps of DNA. A third structured linker 
construct is ZIF-ZnF-GAC which consists of the three finger peptide of ZIF linked to a 
25 three fingered GAC-clone using a "neutral" finger linker, i.e., a wild type ZIF268 
finger 2 with the amino acids at positions -1,2,3 and 6 replaced with serine residues. 

Further constructs are also made. ZIF-F4-GAC comprises finger 4 of TFIIIA 
inserted between Zif268 and the mutant Zi£268 clone GAC (which is a phage selected 
variant of Zif268 capable of binding GCG GAC GCG). The linkers found naturally in 
30 TFIIIA between finger 3 and finger 4 (-NIKICV-) and between finger 4 and finger 5 (- 
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TQQLP-) are retained in both the above peptides. ZIF-F4mut-GAC is identical to ZIF- 
F4-GAC, except that the linkers flanking finger 4 of TFIIIA are replaced by canonical 
linkers having the sequence GERP. ZIF-mutZnF-GAC is identical to ZIF-ZnF-GAC, 
except that the TFIIIA finger 4 flanking sequences comprise -NIKICV- and -TQQLP-. 
5 TF(1 -3)-flex-ZIF and ZIF-flex-GAC contain the 20 amino acid sequence: - 
TG(GSG)5ERP- between their respective three-finger domains. 

Example 2: Construction of 3x2F ZGS Zinc Finger Construct 

The 3x2F ZGS zinc finger construct is created by linking the third finger of 
wild-type ZIF to the first finger of the GAC-clone using the peptide sequence GERP. 
10 To divide the new peptides into three pairs of fingers, one glycine residue is inserted 
' into the peptide linker between fingers 2 and 3 of wild type ZIF and between fingers 1 
and 2 of the GAC-clone. The amino acid and nucleotide sequences of the 3x2F ZGS 
construct are shown in Figure 3. 

The construction of 3x2F ZGS is described with reference to Figures 1 and 3. 

1 5 As shown in Figure 1 , the 3x2F ZGS construct is made by mutagenic PCR of wild type 
ZIF and GAC-clone templates. ZIF and GAC-clone templates are as described in Choo 
& Klug (1994), Proc. Natl Acad Sci USA 91,11 163-1 1 167. Four pairs of 
oligonucleotide primers, A + a, B + b, C + c and D + d are used. As indicated in Figure 
1, primers A, a, B and b are used to amplify and mutagenise wild type ZIF sequence, 

20 while primers C, c, D and d are used to amplify the GAC-clone. The sequences of 
primers A and d comprise restriction sites for Ndel and Notl respectively, while 
primers C and b comprise Eagl recognition sites. Primers B and D are mutagenic 
oligonucleotides, whose sequences comprise linker sequences from wild type ZIF 
(primer B) and GAC (primer D) but with additional nucleotide sequence coding for 

25 additional amino acid residues. These linker sequences are chosen from the linker 
between finger 2 and finger 3 of wild type ZIF (primers a and B) and the linker 
between finger 1 and finger 2 of the GAC clone (primers c and D). For example, in the 
case of 3x2F ZGS, primers B and D each include an additional GGC triplet to code for 
glycine. 
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To construct the 3x2F ZGS clone, wild type ZIF sequence is amplified by 
means of primers A, a, B and b, while GAC-clone sequence is amplified by means of 
primers C, c, D and d. The respective amplification products are then subjected to 
overlap PCR, with a template fill-in step. Finally, each of the products is amplified 

5 with end primers A + b and C + d. The amplification products are then digested with 
Eagl, and ligated at that site. The full length product comprising sequence encoding 
the 6 finger protein isthen digested with Notl and Ndel, and ligated into NotVNdel 
digested pCITE-4b vector (Amersham International Pic). pCITE4b is a eukaryotic 
expression- vector containing a T7 transcription promoter and an internal eukaryotic 

1 0 ribosome translation entry site for protein expression. Plasmids containing the zinc- 
finger constructs are propagated in E. coli XL 1 -Blue (Stratagene) cells. 

The sequences of oligonucleotide primer sequences A, a, B, b, C, c, D and d for 
construction of 3x2F ZGS are shown below, in which restriction sites used in cloning 
and inserted glycine codons are shown in bold, while annealing sequences for PCR are 
1 5 underlined: 

Primer A (SEQ ID NO: 1): 

Nde I START 
5' CAG CCG GCC CAT ATG CGT CTA GAC GCC GCC ATG GCA GAA CGC CCG TAT GCT TG 3' 

Primer a (SEQ ID NO: 2): 

5' CTG TGT GGQ TGC GGA TGT GGQ T 3' 

Primer B (SEQ ID NO: 3): 

Gly 

5* ACC CAC ATC CGC ACC CAC ACA GGT GGC GAG AAG COT TTT GCC 3' 

Primer b (SEQ ID NO: 4): 

Eag I 

5' GCA AGC ATA COG CCG TTC ACC GGT ATG GAT TTT GGT ATG CCT CTT GCG T 3' 

Primer C (SEQ ID NO: 5): 
Eag I 

5' ATG GCA GAA CGG CCG TAT GCT TGC CC 3' 

Primer c (SEQ ID NO: 6): 

5* GTG TGG ATG CGG ATA TGG CGG GT 3' 

Primer D (SEQ ID NO: 7): 

Gly 

5' CCC GCC ATA TCC GCA TCC ACA CAG GTG GC C AGA AGC CCT TCC AG 3' 



20 

25 

30 
35 

40 
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Primer d (SEQ ID NO: 8): 

Wot I STOP 
5' TCA TTC AAG TGC GGC CGC TTA GGA ATT CCG GGC 

5 CGC GTC CTT CTG TCT TAA ATG GAT TTT GG 3' 

Example 3: Construction of the ZIF-GAC Fusion Construct 

The control construct ZIF-GAC is created by joining the third finger of Z1F to 
the first finger of the GAC-clone using the peptide sequence described by Kim and 
Pabo (1998, Proc. Natl Acad Sci. USA 95, 2812-2817), -LRQKDGERP-. This linker 

10 is designed to have compatible ends with the adjacent zinc finger sequences. A 

modification of the method as .described above for Example 2 is used. Thus, primers A 
and b (primer b having the sequence shown below) is used to amplify wild type ZIF, 
while primers C and d are used to amplify the GAC clone, and the two amplified 
sequences joined together. The amino acid and nucleotide sequence of the ZIF-GAC 

15 fusion construct is shown in Figure 2. The oligonucleotide primer sequences A; C and 
d as shown in Example 2 are used for constructing ZIF-GAC, except that primer b has 
the following sequence: 
Primer b (SEQ ID NO: 9): 

20 Eag I Gly 

5' GCA AGC ATA CGG CCG TTC GCC GTC CTT CTG TCT TAA ATG GAT TTT GG 3' 

Example 4: Construction of 3x2F ZGL Zinc Finger Construct 

The 3x2F ZGL construct is created using the same method as described above 
25 for Example 2, except that amino acid residues GGS are inserted into the linker 
sequence between fingers 2 and 3 of wild type ZIF and into the linker sequence 
between fingers 1 and 2 of the GAC-clone. The amino acid and nucleotide sequence of 
3x2F ZGL is shown in Figure 4. The oligonucleotide primer sequences used for 
constructing 3x2F ZGL are the same as for 3x2F ZGS (Example 2), except for the 
30 following: 
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Primer B (SEQ ID NO: 10): 

Gly Gly Ser 

5' ACC CAC ATC CGC ACC CAC ACA G GC GGT TCT GGC GAG AAG CCT TTT GCC 3' 

5 Primer D (SEQ ID NO: 11): 

Gly Gly Ser 

5' CCC GCC ATA TCC GCA TCC ACA CAG GCQ GTT CTG GCC ■ AGA AGC CCT TCC AG 3' 



Example 5: Construction of 3x2F ZGXL Zinc Finger Construct 

1 0 The 3x2F ZGXL construct is created using the same method as described 

above for Example 2, except that amino acid residues GGSGGS are inserted into the 
linker sequence between fingers 2 and 3 of wild type ZIF and into the linker sequence 
between fingers 1 and 2 of the GAC-clone. The amino acid and nucleotide sequence of 
3x2F ZGXL is shown in Figure 5. The oligonucleotide primer sequences used for 

1 5 constructing 3x2F ZGXL are the same as for 3x2F ZGS (Example 2), except for the 
following: 



Primer B (SEQ ID NO: 12): 

Gly Gly Ser Gly Gly Ser 
20 5' ACC CAC ATC CGC ACC CAC ACA GGC GGT TCT GGC GGT TCT GGC GAG AAG CCT TTT 
GCC 3' 

PrimerD(SEQIDNO: 13): 

Gly Gly Ser Gly Gly Ser 

25 5' CCC GCC ATA TCC GCA TCC ACA CAG GOG GTT CTG GCG GTT CT G GCC AGA AGC CCT 
TCC AG 3' 



Example 6: Construction of 3x2F ZGSL Zinc Finger Construct 

The 3x2F ZGSL construct is created using the same method as described above 
30 for Example 2, except that a single glycine residue is inserted into the linker sequence 
between fingers 2 and 3 of wild type ZIF, and amino acid residues GGS are inserted 
into the linker sequence between fingers 1 and 2 of the GAC-clone. The amino acid 
and nucleotide sequence of 3x2F ZGSL is shown in Figure 6. The oligonucleotide 
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primer sequences used for constructing 3x2F ZGSL are the same as for 3x2F ZGS 
(Example 2), except for the following: 



Primer D (SEQ ID NO: 11): 

5 Gly Gly Ser 

5' CCC GCC ATA TCC GCA TCC ACA CAG GCG GTT CT G GCC AGA AGC CCT TCC AG 3' 



Example 7: Construction of 3x2F ZGLS Zinc Finger Construct 



The 3x2F ZGLS construct is created using the same method as described above 
10 for Example 2, except that amino acid residues GGS are inserted into the linker 
sequence between fingers 2 and 3 of wild type ZIF, and a single glycine residue is 
inserted into the linker sequence between fingers 1 and 2 of the GAC-clone. The 
amino acid and nucleotide sequence of 3x2F ZGLS is shown in Figure 7. The 
oligonucleotide primer sequences used for constructing 3x2F ZGLS are the same as 
1 5 for 3x2F ZGS (Example 2), except for the following: 

Primer B (SEQ ID NO: 10): 

Gly Gly Ser 

5 r ACC CAC ATC CGC ACC CAC ACA GGC GGT TCT GGC GAG AAG CCT TTT GCC 3' 

20 Example 8: Protein Expression 

The zinc-finger constructs are expressed in vitro by coupled transcription and 
translation in the TNT Quick Coupled Transcription/Translation System (Promega) 
using the manufacturer's instructions, except that the medium is supplemented with 
ZnCl2 to 500 pM. To judge relative protein expression levels, translation products are 

35 

25 labelled with S-met and visualised by autoradiography, following SDS-PAGE. 
Example 9: Gel Shift Assays 

32 

All constructs are assayed using P end-labelled synthetic oligonucleotide 
duplexes containing the required binding site sequences. The coding strand sequences 
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of the binding sites used in gel shift experiments with peptides containing flexible 
linkers are shown below in Table 1 . Table 2 shows the coding strand sequences of the 
binding sites used in gel shift experiments with peptides containing structured linkers. 

DNA binding reactions contain the appropriate zinc-finger peptide, binding site 
5 and 1 fig competitor DNA (poly dl-dC) in a total volume of 10 ^1 5 which contains: 20 
mM Bis-tris propane (pH 7.0), 100 mM NaCl, 5 mM MgCl 2 , 50 \xM ZnCl 2 , 5 mM 
DTT, 0.1 mg/ml BSA, 0.1% Nonidet P40. Incubations are performed at room 
temperature for 1 hour. 



Name 


Sequence 


Putative target for 
construct 


SEQ ID: 


bsA 


GCG TGG GCG 


Wild type ZIF/3xlF Zif 


14 


bsB 


GCG GAC GCG 


GAC-clone (wild-type 
binding site sequences for 
fingers 1 and 3, middle 
finger binds GAC) 


15 


bsC 


GCG GAC GCG GCG TGG GCG 


ZIF-GAC and 3x2F ZGS 
(contiguous 18 bp site 
comprising wt ZIF and 
GAC-clone sites) 


16 


bsD 


GCG GAC T GCG GCG T TGG GCG 


3x2FZGL(2-finger/6bp 
sites separated by 1 bp) 


17 


bsE 


GCG GAC TC GCG GCG TC TGG GCG 


3x2FZGXL(2-finger/6 
bp sites separated by 2 bps) 


18 


bsF 


GCG GAC T GCG GCG TGG GCG 


3x2FZGSL(lbp gap 
between the binding sites 
for the first and second 
fingers of the GAC-clone) 


19 


bsG 


GCG GAC GCG GCG T TGG GCG 


3x2FZGLS(lbpgap 
between the binding sites 
for the second and third 
fingers of wtZIF) 


20 


Table 

shift ex 


L. The binding site sequences contained within the oligonucleotides used in gel 
.periments with peptides containing flexible linkers. 


Name 


Sequence 


Notes 


SEQ ID: 


bsAl 


GCGTGGGCGTACCTGGATGGGAGAC 


ZIFandTFIIIA(Fl-3) 
binding sites separated 


39 
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by 5 nucleotides 




bsBl 


GCGTGGGCGGTACCTGGATGGGAGAC 


ZIF andTFIIIA (Fl-3) 
binding sites separated 
by 6 nucleotides 


40 


bsCl 


GCGTGGGCGAGTACCTGGATGGGAGAC 


ZIF andTFIIIA (Fl-3) 
binding sites separated 
by 7 nucleotides 


41 


bsDl 


GCGTGGGCGTAGTACCTGGATGGGAGAC 


ZIF andTFIIIA (Fl-3) 
binding sites separated 
by 8 nucleotides 


42 


bsEl 


GCGTGGGCGTTAGTACCTGGATGGGAGAC 


ZIF andTFIIIA (Fl-3) 
binding sites separated 
by 9 nucleotides 


43 


bsFl 


GCGTGGGCGGTTAGTACCTGGATGGGAGAC 


ZIF and TFIIIA (Fl-3) 
binding sites separated 
by 10 nucleotides 


44 


bsGl 


GCGTGGGCGCTTGACGGATGGGAGAC 


ZIF and TFIIIA (Fl-3). 
binding sites separated 
by 6 nucleotides 


45 


bsHl 


GCGTGGGCGAAAAAAGGATGGGAGAC 


ZIF and TFIIIA (Fl-3) 
binding sites separated 
by 6 nucleotides 


46 



Table 2. The binding site sequences contained within the oligonucleotides used in gel 
shift experiments with the TFIIIA (Fl-4)-ZIF peptide. The binding site sequences of 
TFIIIA Fl-3 and wild-type ZIF (bold) are separated by between 5 and 10 bps of DNA. 
5 The DNA sequence used to separate the binding sites is based on the sequence 
spanned by TFIIIA-fmger 4 in the Internal Control Region of the 5S rRNA gene - 
TFIIIA's natural binding site. To investigate any possible sequence preference for the 
region spanned by TFIIIA-finger 4, oligonucleotides containing an altered sequence 
(bsGl), or 6 adenine residues (bsHl) are designed and tested in bandshifts. 

10 



Name 


Sequence 


Notes 


SEQID 


bsA2 


GCGTGGGCGTACCTGGCGGACGCG 


ZIF and GAC-clone 
binding sites separated 
by 6 nucleotides 


47 


bsB2 


GCGTGGGCGGTACCTGGCGGACGCG 


ZIF and GAC-clone 
binding sites separated 
by 7 nucleotides 


48 


bsC2 


GCGTGGGCGAGTACCTGGCGGACGCG 


ZIF and GAC-clone 
binding sites separated 
by 8 nucleotides . 


49 


bsD2 


GCGTGGGCGTAGTACCTGGCGGACGCG 


ZIF and GAC-clone 
binding sites separated 
by 9 nucleotides 


50 
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bsE2 


GCGTGGGCGTTAGTACCTGGCGGACGCG 


ZIF and GAC-clone 
binding sites separated 
by 10 nucleotides 


51 


bsF2 


GCGTGGGCGGTTAGTACCTGGCGGACGCG 


ZIF and GAC-clone 
binding sites separated 
by 1 1 nucleotides 


52 



Table 3. The binding site sequences contained within the oligonucleotides used in gel 
shift experiments with the GAC-F4-Z1F peptide. The binding site sequences of the 
GAC-clone and wild-type ZIF (bold) are separated by between 6 and 1 1 bps of DNA. 
5 The DNA sequence spanned in each case is based on the sequence spanned by TFIIIA- 
fmger 4 in the ICR of the 5 S rRNA gene, as described above in Figure 2. 



Relative dissociation constants are determined by creating 5-fold serial 
dilutions of the required peptide and incubating with the appropriate binding site at a 

10 constant concentration, which is in general between 0.1 and 0.2 nM. The concentration 
of protein at which 50% of the binding site is bound is compared for each peptide, with 
either the full length or part-binding site sequences, to assess the difference in binding 
affinity. In cases where a non-total bandshift appears only in lanes containing the 
lowest concentration of peptide, it is likely that the amount of shift is limited by 

1 5 protein concentration rather than by affinity. Therefore, the relative difference in 
affinity is likely to be greater than that observed and shown. 

Example 9A. Active Peptide Concentration 

To determine the concentration of zinc finger peptide produced in the in vitro 
expression system, crude protein samples are used in gel-shift assays against a dilution 

20 series of the appropriate binding site. Binding site concentration is always well above 
the Kd of the peptide, but ranged from a higher concentration than the peptide (80 
mM), at which all available peptide binds DNA, to a lower concentration (3-5 mM), at 
which all DNA is bound. Controls are carried out to ensure that binding sites are not 
shifted by the in vitro extract in the absence of zinc finger peptide. The reaction 

25 mixtures are then separated on a 7% native polyacrylamide gel. Radioactive signals are 
quantitated by Phosphorlmager analysis to determine the amount of shifted binding 
site, and hence, the concentration of active zinc finger peptide. 
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Example 9B. Binding Affinity and Specificity 

Dissociation constants are determined in parallel to the calculation of active 
peptide concentration. Serial 3, 4 or 5-fold dilutions of crude peptide are made and 
incubated with radiolabeled binding site (0.1 pM - 500 pM depending on the peptide), 
5 as above. Samples are run on 7% native polyacrylamide gels and the radioactive 

signals quantitated by Phosphorlmager analysis. The data is then analysed according to 
linear transformation of the binding equation and plotted in CA-Cricket Graph III 
(Computer Associates Inc. NY) to generate the apparent dissociation constants. The 
Kd values reported are the average of at least two separate studies. 

1 0 Example 10: Binding Affinity of the Control Construct ZIF-GAC 

In order to compare the binding affinities of the various constructs described 
here, the ZIF-GAC peptide is used as a control This peptide may be thought of as a 
pair of three-finger peptides, and accordingly may be designated as 2x3F. The ZIF- 
GAC construct is tested for binding to the binding site bsC and to the ZIF binding site 

1 5 alone (bsA). The results are shown in Figure 9A. Figure 9A show that the composite 
site bsC is bound 125-500 fold more tightly than the 9bp bsA site. This result is 
comparable to that observed when the experiment of Kim and Pabo (1998, Proc. Natl 
Acad Sci. USA 95, 2812-2817) is repeated using our methods of protein production 
and bandshift, ie testing the ZIF-NRE peptide for binding to its composite site versus 

20 the ZIF wt site. 

Example 11: Binding Affinities of Constructs 3x2F ZGS and 3x2F ZGL 

The binding affinities of ZIF-GAC, 3x2F ZGS and 3x2F ZGL peptides for a 
contiguous 18 bp site (bsC) and the 9 bp ZIF binding site (bsA) alone are determined. 
Serial five-fold dilutions of peptide are made and incubated with 0.13 nM binding site. 
25 Significantly, the results show that the 3x2F peptides bind the contiguous 1 8 bp site at 
least as tightly as the 2x3F ZIF-GAC peptide (Figures 9A and 9B). Moreover, the 
3x2F peptides display greater selectivity for the 18 bp site over the 9 bp site, than does 
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the 2x3F peptide. The affinity of the 3x2F peptides for the 9bp half-site is reduced due 
to the extended linker sequence between fingers 2 and 3 of ZIF. The expression level 
of the 3x2F ZGL peptide is approximately half that of the ZIF-GAC and 3x2F ZGS 
peptides in this study, which accounts for its slightly weaker apparent affinity 
5 (expression data not shown). 

Example 12: Binding Affinities of Constructs 3x2F ZGS, 3x2F ZGL and 3x2F 
ZGXL 

The next experiment is designed to determine whether 3x2F peptides can be 
used to bind noncontiguous sites with two separate regions of unbound DNA. The 
10 constructs used in this study are 3x2F ZGS, 3x2F ZGL and 3x2F ZGXL, and are 
targeted to the sequences of bsC, bsD and bsE. These sequences can be described as 
comprising three sets of 6 bp sub-sites, which are either contiguous, separated by 1 bp 
or separated by 2 bps of unbound DNA. 

As shown in Figure 9B, the results demonstrate that the 3x2F ZGS and 3x2F 
1 5 ZGL peptides bind the contiguous 1 8bp site (bsC) equally tightly (taking into account 
the different protein expression levels). We also find that the 3x2F ZGL peptide can 
bind the non-contiguous site (bsD) as tightly as it does the contiguous 18bp site bsC 
(see Figures 9B and 10). However, the 3x2F ZGS peptide binds bsD over 125-fold 
more weakly than it does bsC (compare left hand panels of Figure 9B and Figure 10). 
20 This is in accordance with the fact that the short, five amino acid synthetic linkers 

within 3x2F ZGS are unable to span 1 bp of DNA, and therefore the 3x2F ZGS peptide 
binds the bsD site through only one pair of fingers. 

Figure 1 1 shows that the 3x2F ZGXL peptide can bind the noncontiguous site 
(bsD) as tightly as it does the contiguous 18bp site bsC. 3x2F ZGXL binds the non- 
25 contiguous site bsD approximately as tightly as the 3x2F ZGS peptide binds the 

contiguous 18bp site, bsC. However, the 3x2F ZGXL peptide binds bsE (containing 2 
base pair gaps between target subsites) approximately 500-fold less tightly than it does 
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bsC and bsD, as shown in Figure 1 1. This is presumably because it can only bind bsE 
through 2 fingers. 

Example 13: Binding Affinities of Constructs 3x2F ZGSL and 3x2F ZGLS 

As a continuation of the above experiment, 3x2F peptides are constructed with 
5 different combinations of engineered linkers within a ZIF-GAC fusion peptide. In the 
construct 3x2F ZGSL the first two pairs of fingers are separated by a short 
(-GG E /qKP-) linker and the second two pairs are separated by a longer 
(-GGSG E / Q KP-) linker (see Figure 6). In the construct 3x2F ZGLS the first two pairs 
of fingers are separated by a long (~GGSG E /qKP-) linker and the second two pairs are 
1 0 separated by a shorter (-GG E /qKP-) linker (see Figure 7). 

These two peptides are tested for binding to binding sites bsF, which has a 1 bp 
gap between the first two 6bp subsites, and bsG which has a 1 bp gap between the 
second two 6bp subsites (see Table 1). As expected, given the previous observations, 
the results demonstrate that the binding of arrays of zinc finger pairs can be tailored to 

15 suit the length of .gap between 6 bp binding subsites. Figure 12 shows the results of a 
gel shift experiment testing the binding of 3x2F ZGSL peptide to bsD, bsE and bsF, 
which is through 4, 2 and 6 fingers respectively. From the binding patterns it can be 
seen that the affinity of the 6-finger bound complex (3x2F ZGSL on bsF, right hand 
panel) is approximately 10-fold higher that the 4-finger bound complex (3x2F ZGSL 

20 on bsD, middle panel) and 125-500 fold stronger than the 2-finger bound complex 
(3x2F ZGSL on bsE, left hand panel). 



Similarly, 3x2F ZGLS peptide is tested for binding to bsD, bsE and bsG, which 
is through 4, 2 and 6 fingers respectively. It is found that the affinity of binding of 
3x2F ZGLS is strongest for bsG, followed by bsD and lastly bsE, witbrelative 
25 affinities similar to those obtained from 3x2F ZGSL above. 
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Example 13 A: Binding Affinity of 3x2F ZGS and Zif-GAC 

A preliminary-experiment is conducted using the three-finger Zif268 peptide 
against its 9 .bp binding site as a form of 'protocol calibration' . This gives a value for 
the Kd of Zi£268 of 0.45 nM, which is within the range expected for this peptide, m 

5 To determine the binding specificity of different styles of six-finger peptides, 

the 3x2F ZGS and Zif-GAC peptides re first used in gel-shift experiments with the 9 
bp Zif268 half-site, and a full 18 bp binding site (bsC 5 also -termed "123456"). These 
results show that the 3x2F ZGS and2x3F Zif-GAC peptides bind their full-length 
target site with similar affinities, of 0.6 and 1:4 pMrespectively (Table 4 below). 

10 However, their affinities for the Zif268 half-site are dramatically different. The 2x3 F 
Zif-GAC peptide binds with an affinity of approximately 2.2 nM (which is within the 
range expected), but the 3x2F ZGS peptide binds with an affinity of about 110 nM. 
This affinity is so weak that it is difficult to quantify using this system. From these . 
data it can be seen that the 3x2F peptide discriminates between the two sites over 100- 

1 5 fold more strongly than the 2x3F peptide. 

To further study the specificity of the two constructs the 3x2F and 2x3F 
peptides are targeted against binding sites that have been mutated in the region 
normally bound by finger 4. These results show that the 3x2F ZGS peptide binds to the 
site with a 3 bp region mutated, 123///56, with an affinity of 890 pM. Meanwhile, it 

20 binds to a site with this 3 bp region deleted, 12356, with an affinity of 22 nM (see 

Table 5 below). Its affinities for sites with 1 or 2 bp deletions are 270 pM and 630 pM 
respectively. Hence, the affinities of 3x2F ZGS for these mutant sequences are 
between 450 and 37,000-fold weaker than for the correct binding sequence. In 
contrast, the 2x3F Zif-GAC peptide binds 123///56, 123//56, and 123/56 with affinities 

25 of 1 5, 1 4 and 1 4 pM respectively. This is just 1 0-fold weaker than that for its correct 
binding site. The 2x3F Zif-GAC peptide shows a further reduction in affinity for the 
12356 binding site, but this sequence is still bound more than 60 times stronger than it 
is bound by 3x2F ZGS. The gel-shift data in Figure 25 demonstrates the relative 
binding affinities of the 2x3F Zif-GAC and 3x2F ZGS peptides for these binding sites. 
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All this data serves to emphasise the enhanced specificity of the 3x2F construct for 
sequences that resemble its correct target site. The gel-shift data of Figure 25 
demonstrates the relative affinities of the 3x2F ZGS and 2x3F Zif-GAC peptides for 
the target sites. 



5 



Binding Site 


Binding Site 


Apparent Kd (pM) 


Name 


Sequence* 


3x2F ZGS 


2x3F Zif- 
GAC 


bsA (Z1F) 


GCG TGG GCG 


l.l xlO 3 


2200 


123456 (bsC) 


GCG GAC GCG GCG TGG GCG 


0.6 


1.4 


123///56 (bs4) 


GCG GAC ATC GCG TGG GCG 


890 


is 


123//56(bs3) 


GCG GAC TC GCG TGG GCG 


270 


14 


123/56 (bs2) 


GCG GAC 1 GCG TGG GCG 


630 


14 


12356 


GCG GAC GCG TGG GCG 


2.2 x 10 4 


360 



Table 4. The binding site sequences used in gel-shift experiments with the 3x2F ZGS 
and 2x3F Zif-GAC peptides and the binding affinities obtained. * Binding site residues 
which are mutated (and subsequently deleted) are underlined. 



Example 13B. Binding of Non-Contiguous Sequences 

10 A second set of binding studies is conducted to demonstrate the ability of the 

3x2F peptides to accommodate one or more regions of unbound DNA within their 
recognition sequence. First the 3x2F ZGS and ZGL peptides are titrated against 
12/34/56 (three 6 bp subsites separated by 1 bp, which is represented by a single V in 
the binding site name) and 12//34//56 (three 6 bp subsites separated by 2 bps) binding 

1 5 sites. The results in Table 5 show that the 3x2F ZGS peptide - which is designed to 
target only the contiguous 123456 site - is unable to accommodate either 1 bp or 2 bp 
gaps between the two-fmger subsites. The 3x2F ZGL peptide, however, binds the 
12/34/56 site with an affinity of approximately 5 pM, but is also unable to bind tightly 
to the site with 2 bp gaps. Next, the 3x2F ZGSL and 3x2F ZGLS peptides are targeted 
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against the three non-contiguous sequences: 1234/56, 12/3456 and 12//34//56. These 
sites are bound by the 3x2F ZGSL peptide with affinities of approximately 3 pM, 73 
pM and 12nM, which is in accordance with the binding of 6, 4 and 2 fingers 
respectively. 3x2F ZGLS show a similar trend in binding affinities. These, experiments 
5 demonstrate that 3x2F peptides can bind contiguous 18 bp sites, but are also unique 
amongst the six finger peptides reported to date, in being able to bind sequences with 
two regions of unbound DNA with high affinity. 



Binding Site 


Binding Site 




Apparent Kd t (pM) 




Name 


WwUUvUvv 


3x2F 
ZGS 


3x2F 
ZGL 


3x2F 
ZGSL 


3x2F 
ZGLS 


123456 


GCG GAC GCG GCG TGG GCG 


0.6 


0.9 


ND 


ND 


(bsC) 
12/34/56 


GCG GAC T GCG GCG T TGG GCG 


1.8 x 10' 


5 


110 


120 


(bsD) 
12//34//56 


GCG GAC TC GCG GCG TC TGG GCG 


ND 


l.lxlO 4 


1.2 xlO 4 


1.2 xLO 4 


(bsE) 
1234/56 


GCG GAC T GCG GCG TGG GCG 


54 


ND 


3 


89 


(bsF) 
12/3456 


GCG GAC GCG GCG T TGG GCG 


77 


ND 


73 


5 


(bsG) 













Table 5. The binding site sequences used in gel-shift experiments with the 
3x2F peptides and the binding affinities determined. *Designed gaps in the target 
10 sequence are shown in bold. fND (not done) represents experiments for which Kds are 
not calculated. 

It appears that the more rigid nature of the 2x3 F Zif-GAC peptide means that a 
mutation in the binding site of one finger is 'felt* only by that finger, so that the 
123///56 site is bound with the extremely high affinity of 15 pM. In contrast, the results 
1 5 above show that the more sensitive design of the 3x2F peptides mean that a mutation 



WO 01/53480 



PCT/GB01/00202 



75 

in the binding sequence of a single finger weakens the entire two-finger unit. Thus, the 
3x2F ZGS peptide binds the same site with an affinity of 890 pM The large reduction 
in affinity of the 3x2F ZGS peptide for the Zif268 half-site must be attributed to the 
extended linker sequence between fingers 2 and 3. Presumably this linker reduces the 

5 co-operative binding effect of the adjacent fingers, such that finger 3 of the peptide 
adds nothing to the binding of the half-site. Meanwhile, the unbound fingers probably 
'drag' on the complex to help pull the peptide off the DNA. The higher affinity of the 
3x2F peptides for other sites that are bound by only two fingers (such as the 3x2F ZGS 
peptide against the 12/34/56 site) presumably arises because there are three separate 

1 0 two-finger binding sites present in the sequence. 

Example 14: Binding Affinities of Construct 3xlF ZIF 

A peptide denoted 3xlF ZIF (Figure 8) is constructed by inserting a single 
glycine residue within each of the natural linkers in the wt ZIF gene. A further 
extension of this design is also used to create 6xlF ZG, which is a six-finger ZIF-GAC 
1 5 clone containing a glycine insertion within every linker peptide. The binding affinity 
of the 3x1 F peptide for the 9bp ZIF site (bsA) is tested, and the construct is shown to 
bind the substrate. 

Example 15. Structured Linkers 

The experiments described in the following Examples seek to increase the 
20 utility of poly-zinc finger peptides by creating fusion peptides that are able to bind 
with high affinity to target sequences in which their binding subsites are separated by 
long (up to 10 bp) stretches of DNA. The Examples utilise structured linkers which 
are believed to show a preference for a particular length of DNA span, so that they 
maintain a high degree of specificity. The crystal structure of the first six fingers of 
25 TFIIIA bound to DNA (Nolte, R. T., Conlin, R. M., Harrison, S. C. & Brown, R. S. 
(1998) Proc. Natl Acad Sci. USA 95, 2938-2943), indicate that that TFIIIA finger 4 
may be a suitable candidate for a structured linker to span long (> 5 bp) stretches of 
DNA. 
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A fusion peptide comprising the first four fingers of TFIIIA and the three 
fingers of Zif268> called TF(l-4)-ZIF, is first created. This is shown to bind DNA 
with high affinity and showed a preference for sites containing 7 or 8 bps of non- 
bound DNA. In contrast, a similar construct that contains a 20 residue flexible linker, 
5 TF(1 -3)-flex-ZIF, is seen to bind its full-length target sites somewhat weaker. The 
data in these Examples suggests that TFIIIA finger 4 is a suitable 'structured' linker 
for spanning long stretches of DNA, and furthermore, that TF(1-4)«ZIF would make a 
good scaffold for 'designer' transcription factors that bind DNA with 7 or 8 bps of 
non-bound DNA. 

1 0 The Examples also test the ability of a zinc finger module from Zif268 to act as 

a structured linker. A zinc finger from Zif268 is mutated to make it non sequence- 
specific, and then used to link the three wild-type fingers of Zif268 to a three-finger 
mutant of Zi£268 (GAC). This 'serine-finger' is expected to sit in the major groove, 
spanning 3 or 4 bps of DNA. Surprisingly, this new peptide is found to be able to bind 

1 5 with similar affinity to the continuous 1 8 bp sequence comprising the Zif268 and GAC 
sites, and to all the non-contiguous sites with 1 -1 0 bp gaps. The fact that this peptide 
can bind tightly to the contiguous binding site and the sites with just 1 or 2 bp gaps 
suggests that the 'serine-finger' is able to flip out of the major groove to make space 
for the binding of its neighbouring fingers. This data indicates that within a zinc finger 

20 array redundant fingers can make way for stronger DNA-binding domains. When the 
binding subsites are separated by 7-10 bps of DNA it is likely that the redundant finger 
lies across the surface of the DNA, in a manner analogous to TFIIIA finger 4 (15). 

The Examples also describe a fusion construct, ZIF-F4-GAC, which uses 
TFIIIA finger 4 as a linker between two Zif-type domains. This peptide displays little 
25 discrimination for the length of DNA span separating the binding subsites, although a 
trend in the binding affinities of the peptide is apparent. All peptides connected by 
zinc finger modules show a preference for sequences containing 3 bp or over 6 bp 
gaps. These probably correspond to binding modes when the zinc finger-linker is sat 
'normally' in the major groove, or able to bridge the minor groove. 
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It has been proposed that the relatively hydrophobic linkers flanking TFIIIA 
finger 4 may constrain finger 4 into its orientation across the minor groove, as 
observed in the crystal structure of Nolte et al (1998). Hence, the Examples also 
describe investigations into the conformational freedom of zinc fingers by swapping 
5 the linker sequences flanking wild-type TFIIIA finger 4 and the 'serine-finger'. It is 
found that the linker sequences flanking TFIIIA finger 4 only confer a small degree of 
structural rigidity, which is most apparent when the finger is forced to take up 
unfavourable conformations, 

A predicted benefit of using structured linkers is that of increased binding 
10 affinity over peptides containing long, flexible linkers. This is confirmed by the 

Examples which disclose binding results from the two peptides containing 20 residue 
flexible linkers, which are found to bind their full-length targets between 3 and 10-fold 
weaker than peptides with structured linkers. 

Poly-zinc finger peptides, are likely to become increasingly important in gene 

1 5 therapy and the creation of transgenic organisms. Given the difficulty of engineering 
zinc finger peptides to bind to all possible DNA sequences (Choo, Y. & Klug, A. 
(1994) Proc. Natl. Acad, Sci USA 91, 1 1 168-1 1 172; Segal, D. T.,Dreier, B., Beerli, 
R. R. & Barbas, C. F. Ill (1999) Proc. Natl. Acad Sci. USA 96, 2758-2763.), it is 
advantageous to synthesise peptides capable of spanning long regions of DNA, while 

20 still binding with high affinity. This will allow the selection of favourable DNA target 
sites that may be several nucleotides apart. The Examples show that 'structured' 
linkers may be incorporated into zinc finger fusion peptides. These allow the separate 
DNA-binding domains to bind with high affinity to sites separated by 1 to 10 bps of 
non-bound DNA. The ability of these structured-linker fusion peptides to span such 

25 long stretches of DNA is particularly advantageous for the targeting of natural 

promoter sequences. For example, the zinc finger protein, Spl, binds GC box DNA, 
which can appear in multiple copies in the promoter sequences upstream of a variety 
of cellular and viral genes (Kadonaga, J. T., Jones, K. A. & Tjian, R. (1986) Trends 
Biochem. Set 11, 20-23; Bucher, P. (1990) J. Mol Biol 212, 563-578). Similarly, the 

30 promoter for the HSV40 early genes contains three 2 1 bp repeats which include GC 
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boxes. Linking zinc finger peptides that recognise such regions could create powerful 
'designer' transcription factors. TFIIIA finger 4 may be a particularly useful 
'structured' linker as it shows a marked preference for 7 or 8 bp DNA spans. 

The Examples also indicate that weakly binding zinc fingers are able to 'flip' in 
or out of the DNA major groove to accommodate neighbouring fingers within the 
DNA-binding domain. This means that certain zinc finger arrays will bind reasonably 
tightly to truncated or mutated binding sites. This feature of zinc-finger arrays may be 
taken advantage of, for instance to engineer zinc fingers which bind to a series of 
related, but different binding sites. Nature almost certainly takes advantage of this 
phenomenon to evolve zinc finger transcription factors that regulate multiple genes 
from non-identical promoters. Furthermore, many natural polydactyl proteins that have 
been isolated contain zinc fingers whose roles are not yet understood. For example, 
GL1 contains five tandem zinc fingers, but in the crystal structure of this protein only • 
two of these bind to DNA in the classical -base specific- manner (Pavletich, N. P. & 
Pabo, C. 0. (1991) Science 261, 1701-1707). The results presented in the Examples 
also suggest that there may be a broad repertoire of roles for zinc finger domains 
within the cell. The Examples also show that polydactyl peptides comprising flexible 
linkers may be created that bind with far greater specificity than previously designed 
six-finger peptides. 

Example 15A: Construction of TFIIIA(Fl-4)-ZIF Zinc Finger Construct 

The TFIIIA(Fl-4) construct is made by fusing the first four fingers of TFIIIA 
N-terminally to the three fingers of wt ZIF. The natural linker between fingers 4 and 5 
of TFIIIA is used as the linker between TFIIIA finger 4 and ZIF finger 1. However, 
the construct is designed such that the entire TFIHA finger 4 region acts as a structured 
linker between TFIIIA fingers 1-3 (which bind DNA) and wt ZIF fingers 1-3 (which 
also bind DNA). 

The construction of TFIIIA(Fl-4) is described with reference to Figures 13 and 
15. As shown in Figure 13, the TFIIIA(Fl-4) construct is made by PCR using two 
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pairs of primers A + a and B + b to amplify wild type TFIIIA and wild type ZIF 
templates respectively. Primers A and b comprise restriction sites for Ndel and Notl 
respectively. The respective amplification products are then subjected to overlap PGR, 
with a template fill-in step. Finally, the products are amplified with end primers A + b, 
5 digested with Notl and Ndel, and ligated into NotVNdel digested pCITE-4b vector 
(Amersham International Pic). 

Primer A (SEQ ID NO: 29): 

Nde I 

10 S' ACT TCG GAA TTC GCG GCC CAG CCG GCC CAT ATG GGA GAG AAG GCG CTG CCG GTG 3' 

Primer a (SEQ ID NO: 30): 

S' GCA AGC ATA CGG CAG CTG CTG TGT GTG ACT G 3' 

15 

Primer B (SEQ ID NO: 31): 

5' ACA CAG CAG CTG CCG TAT GCT TGC CCT GTC GAG TCC 3' 

20 Primerb(SEQIDNO:32): 

Not I STOP 

5' GAG TCA TTC AAG CTT TGC GGC CGC TTA GTC CTT CTG TCT TAA ATG GAT TTT GG 3' 

Example 16: Construction of GAC-F4-ZIF Zinc Finger Construct 

25 The GAC-F4-ZIF construct is made by joining the GAC-clone to the N- 

terminus of wt ZIF, using the entire TFIIIA finger 4 peptide (including its natural 
flanking linker sequences) as a structured linker. 

The construction of GAC-F4-ZIF is described with reference to Figures 14 and 
16. As shown in Figure 14, the GAC-F4-ZIF construct is made by PCR using two pairs 
30 of primers C + c and D + d to amplify the GAC clone and TFIIIA(Fl-4) templates 

respectively. Primers C and d comprise restriction sites for Ndel and Notl respectively. 
The respective amplification products are then subjected to overlap PCR, with a 
template fill-in step. Finally, the products are amplified with end primers C + d, 
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digested with Notl and Ndel, and .ligated into NotUNdel digested pCITE-4b vector 
(Amersham International Pic). 
PrimerC(SEQIDNO:33): 

Nde I 

5' ACT TCG GAA TTC GCG GCC CAG CCG GCC CAT ATG GCA GAA CGC CCG TAT GCT TG 3' 

Primer c (SEQ ID NO: 34): 

5' CAC ATA GAC GCA GAT CTT GAT GTT ATG GAT TTT GGT ATG CCT CTT GCG 3' 

Primer D (SEQ ID NO: 35): 

5 ' CAT AAC ATC AAG ATC TGC GTC TAT GTG 3' 

15 Primer d (SEQ ID NO: 36): 

Not I STOP 

5' GAG TCA TTC AAG CTT TGC GGC CGC TTA GTC CTT CTG TCT TAA ATG GAT TTT GG 3' 

Example 17: Construction of ZIF-ZnF-GAC Zinc Finger Construct 



10 



To create the ZIF-ZnF-GAC construct, primers A + b and C + d are used to 
20 amplify the wild type ZIF and GAC clone sequences, respectively. These are then 
digested with Eag I to create sticky ends. Next, the "neutral" zinc finger (ZnF) is 
produced by annealing the following complimentary oligonucleotides: 5' GG CCG 
TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT TCT AGT AGC TCT 
CTT ACC AGC CAC ATC CGC ACC CAC ACA GGT GAG C 3' (SEQ ID NO: 37) 
25 and 5' GG CCG CTC ACC TGT GTG GGT GCG GAT GTG GCT GGT AAG AGA 
GCT ACT AGA ACT GAA GTT ACG CAT GCA GAT TCG ACA CTG GAA C 3' 
(SEQ ID NO:38), which create Eag I sites at each end. The complete ZIF-ZnF-GAC 
construct is finally generated by joining the ''neutral" finger to the Eag I cut ZIF and 
GAC sequences. This construct is then digested with Nde I and Not I and ligated into 
30 similarly digested pCITE-4b vector (Amersham International Pic). 
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Example 17A. Construction of ZIF-F4-GAC, ZIF-F4mut-GAC, ZIF- mutZnF- 
GAC, TF(l-3)-flex-ZIF and ZIF-flex-GAC 



ZIF-F4-GAC and ZIF-F4mut-GAC 

J 

The ZIF-F4-GAC and ZIF-F4mut-GAC constructs are made by three separate 
5 PCR amplifications of the three fingers of Zif268, the three fingers of a Zif268 mutant 
peptide (GAC), and the fourth finger of TFHIA. Two sequential overlap PCR reactions 
are then used to fuse the separate units together, creating seven-finger constructs. 

ZIP-mutZnF-GAC 



The ZIF-mutZnF-GAC construct is made by PCR amplification of the three- 
10 fingers of wt Zif268 and the Zif268 mutant (GAC), creating Eag I sites at their C-and 
N.-termini respectively. The structured linker, ZnF, described above in Example 17, is 
inserted between the Eag I cut ZIF and GAC three-finger units to create the complete 
seven-finger construct The ZIF-mutZnF-GAC clone IS made by PCR amplification of 
the ZIF, GAC, and ZnF structured linker fragments to create mutant ends. These three 
1 5 fragments are joined by two sequential rounds of overlap PCR as above. 



TFa-3>flex-ZIF and ZIF-flex-GAC 



The TF(l-3)-flex-ZIF and ZIF-flex-GAC constructs are created by PCR 
amplification of the first three fingers of TFIIIA, the three fingers of Zi£268 or the 
three fingers of the GAC-clone - using appropriate oligonucleotides - which are 

20 designed to generate the flexible 20 amino acid linker peptide, -TG(GSG) 5 ERP-, and . 
Eag I sites at the position to be joined. The required six-finger constructs are 
synthesised by digesting the PCR products with Eag I and ligatiiig at that site. All 
zinc-finger constructs are digested with Xba I and Eco RI restriction enzymes and 
inserted into the similarly digested, eukaryotic expression vector pcDNA 3.1(-) 

25 (Invitrogen). The sequences of all constructs are confirmed by dideoxy sequencing. 
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Example 18: Binding Affinities of Construct TFIIIA(Fl-4)-ZIF 

The initial study on a structured-linker containing fusion peptide is conducted 
on the TF(Fl-4)-ZIF construct. This experiment is designed to investigate a couple of 
issues. First, can TFIIIA finger 4 be used, successfully, outside its natural protein . 
5 context, to bridge a region of DNA within a non-contiguous binding site? Second, to 
determine the optimal DNA span of TFIIIA finger 4 within a synthetic fusion peptide. 

The TF(Fl-4)-ZIF peptide is targeted against non-contiguous binding sites 
comprising the TFIIIA fingers 1-3 recognition site and the three-finger ZIF site, 
separated by between 5 and 10 bps of unbound DNA (Table 2). The relative affinity of 

10 the peptide for these sites is then compared with its affinity for the ZIF subsite bsA 
alone. A selection of the gel shift results are shown in Figure 1 8, which shows that the 
TFIIIA(Fl-4)-ZIF construct can bind nucleic acid substrates consisting of TFIIIA and 
ZIF subsites separated by 6 or 7 base pairs. From such gels it is clear that the DNA 
span of TFIIIA finger 4 in this construct is as much as lObp. Non-contiguous binding 

1 5 . sites with 6-9 bps of intervening DNA can be bound, although the optimal spacing is 
found to be 7 or 8 bp. These optimal sites are bound at least 125-fold tighter than the 
ZIF site alone. 



The results of this experiment accord with the fact that the fourth finger of 
TFIIIA is known not to bind DNA in a sequence-specific manner, and that this finger 
20 jumps, spans or bridges the minor groove of DNA in the crystal structure of the first 6 
fingers of TFIIIA (Nolte etal t 1998, Proc. Natl. Acad Set USA 95, 2938-2943.). 

Example 19: Binding Affinities of Construct GAC-F4-ZIF 

To determine whether TFIIIA F4 would still function as a linker when taken 
out of the context of neighbouring TFIIIA fingers, the GAC-F4-ZIF construct is made 
25 (Figure 1 4 and 1 6). This construct can be thought of simply as two ZIF-based DNA 
binding domains joined by a structured linker (in this case TFIIIA F4). As above, this 
construct is tested for affinity against a range of sequences, comprising the appropriate 
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binding subsites separated by 6 to 1 1 bps of DNA (Table 3). In these studies TFIIIA 
finger 4 is again demonstrated to be an effective linker. Results of gel-shift 
experiments are shown in Figures 19 and 20. As before the new peptide is shown to 
bind its optimal, full-length target sites at least 125-fold stronger than the 9 bp ZIF site. 
5 In this case, however, the optimal DNA span is found to be 8 or 9 bps, although 7-1 1 
bp stretches could be spanned without a significant loss in binding affinity. 

Example 20: Binding Affinities of Construct ZIF-ZnF-GAC 

We next tested the possibility that a natural zinc finger, of the type found in the 
ZIF peptide, may function as a stable unit that spans 3 bps (or occasionally 4 bps) of 
10 DNA while occupying the major groove. If so, a 'neutral' zinc finger module, i.e. one 
that does not recognise a specific DNA sequence, might be used as a structured linker 
to span 3 or 4 bps. 

Fpr this purpose a 'neutral' finger is created by replacing the DNA binding 
residues (those at positions -1 , 2, 3., and 6) of wild type ZIF268 finger 2, with serine 

15 residues. Serine can act as either an H-bond acceptor or donator, and can therefore 
interact with all four bases in DNA. This new finger, denoted "ZnF" and flanked by 
two GERP linkers, is used to join the three-finger peptides of ZIF and the GAC-clone, 
creating the seven-finger array ZIF-ZnF-GAC (Figure 17). This peptide is targeted 
against non-contiguous sites comprising the 9 bp ZIF and GAC-clone recognition 

20 sequences separated by 2, 3, 4 or 5 bps of DNA, and also sites bsA and bsC for 

comparison (Figure 21). The results demonstrate that the peptide binds all full-length 
target sites comprising the ZIF and GAC subsites either adjacent or separated by up to 
5 base pairs of unbound DNA at least 500-fold tighter than it does the ZEF site alone. 
These results suggest that the peptide may bind the contiguous ZIF-GAC site 

25 fractionally weaker than it does the non-contiguous sites, but the difference (if any) is 
slight. Hence, it appears that the "neutral" zinc finger linker is able to function as an 
effective linker, either in or out of the DNA major groove. 
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Example 20A. Binding Affinity of TFIIIA / ZIF Fusion Peptides. 

The TF(l-4)-ZIF and TF(l-3>flex-ZIF peptides are tested against the non- 
contiguous TF-5,6,7,8,9-Z sites. In these first experiments the DNA composition of the 
non-bound region is based on the endogenous TFIIIA target site. The results clearly 

5 show that the TF( 1 -4>ZIF peptide has a preference for non-contiguous sites separated 
by 7 or 8 bp gaps, which are bound with a Kd of approx. 3 pM (Table 6). The target 
sites with 5, 6 or 9 bp gaps are bound at least 5-fold weaker (Figure 27A). In contrast, 
the TF(l-3)-flex-ZIF peptide shows no preference for a particular DNA span, binding 
all non-contiguous sites with affinities of around 60 pM (Figure 27B). Further studies 

10 are conducted on binding sites with various sequences in the non-bound region of the 
DNA target site. These demonstrate that the peptides have no preference for particular 
sequence compositions within this non-bound region (data not shown). Both constructs 
bind the Zi£268 half-site with similar affinity, as expected. 

1 5 Example 20B. Binding Affinity of ZIF / G AC Fusion Peptides 

The first binding study is conducted on ZIF-F4-GAC to determine the optimal 
span of TFIIIA finger 4 in this construct. This peptide is titrated against the continuous 
18 bp ZM binding site, and non-continuous binding sites with 1-10 bps of non-bound 
DNA. Our results demonstrate that this peptide has little preference for a particular 

20 span of DNA, although the highest affinity binding is observed for sites containing 3 
bp or > 7 bp insertions (Table 7). The fact that this peptide is able to bind with such 
high affinity to sites with less that 3 bp gaps is highly unexpected. The slight reduction 
' in binding affinity observed in these examples is presumably because the 1-2 bp gaps 
are too small to accommodate a zinc finger in the DNA major groove. In these 

25 circumstances it seems likely that the non-binding finger actually flips out of the DNA 
leaving the remaining fingers to bind the target site. The slight reduction in affinity for 
sites with 5 or 6 bp gaps is probably because TFIIIA finger 4 has to stretch half a 
helical turn around the DNA. For longer gaps the finger is likely to. span the minor 
groove as is seen in wild-type TFIIIA. 
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A further set of binding studies is then carried out on the construct containing 
the non-specific zinc finger linker, ZIF-ZnF-GAC. Although,this construct is expected 
to target (primarily) non-contiguous sequences containing three or four base pairs of 
non-bound DNA, it is tested against all of the binding sites from ZM to Z10M. Our 
5 gei-shift data again demonstrates that this peptide is able to bind its optimal targets 
with very high affinity (3-4 pM), and shows a similar trend in binding affinity to the 
ZIF-F4-GAC peptide (Figure 27C), However, this peptide is able to bind its least 
favourable sites with slightly greater affinity than observed for the previous peptide 
(Table 7). 

10 It was thought that the -NIKIC V- and -TQQLP- linkers found either side of 

wild-type TFIIIA finger 4 would be more structured than the flexible -TGERP- linkers 
which flanked the serine-mutated finger of ZIF-ZnF-GAC. Therefore, the ZIF-mutF4- 
GAC and ZIF-mutZnF-GAC peptides are synthesised and tested to determine whether 
these linker sequences are responsible for the less selective binding of the ZIF-ZnF- 

1 5 GAC peptide. These new peptides are targeted against all eleven binding sequences, as 
above. The ZIF-mutZnF-GAC peptide is found to bind the Z5M and Z6M binding 
sites with Kd's of 1 8 pM and 1 1 pM respectively. All other binding sites are bound 
with very similar affinities to the ZIF-ZnF-GAC peptide (data not shown). By 
comparison, the ZIF-mutF4-GAC peptide binds both the Z5M and Z6M sites with 

20 apparent Kd's of 13 pM. From these data is appears that the -NIKIC V- and -TQQLP- 
linkers slightly weaken the binding of the peptides to DNA sequences with 5 or 6 bp 
gaps. This may be because they are less flexible than the -TGERP- linkers, and are 
less able to bend around the DNA helix. No differences in DNA-binding 
characteristics for the different linker combinations are observed when the binding 

25 subsites are located on approximately the same face of the DNA. 

Finally, the ZIF-flex-GAC peptide is examined in the same way as the 
structured-linker peptides above. This peptide, as with the TF(l-3)-flex-ZIF peptide, 
displays no preference for a particular length of DNA span, and bound all sites with 
affinities of approximately 50 pM. This 3-10 fold reduction in affinity - compared to 
30 peptides connected by structured linkers - is probably due to the increased 
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conformational freedom of this peptide, which makes DNA binding less entropically 
favourable. 



Binding Site 


Binding Site 


Apparent Kd (pM) 


Name 


Sequence" 


TF(I~4)-ZIF 


TF-flex-ZIF 


ZIF 


GCGTGGGCG 


2000 


1800 


TF5Z 


GCGTGGGCGX S GGATGGGAGAC 


21 


63 


TF6Z 


GCGTGGGCGX^GGATGGGAGAC 


17 


68 


TF7Z 


GCGTGGGCGX 7 GGATGGGAGAC 


3 


57 


TF8Z 


GCGTGGGCGX 8 GGATGGGAGAC 


3 


61 


TF9Z 


GCGTGGGCGX9GGATGGGAGAC 


15 . 


58 


Table 6. The binding site sequences used in gel-shift experiments with the 
TFIIIA-ZIF fusion peptides and the binding affinities obtained. *Non-bound DNA 
bases in the target sequence are shown by a bold 'X'. The exact base composition of 
these gaps is found to have no significant effect on peptide affinity. 


Binding Site 


Binding Site 


Apparent Kd (pM) 


Name 


Sequence* 


ZIF-F4-GAC 


ZIF-ZnF-GAC 


ZIF 


GCGTGGGCG 


2200 


2000 


ZM 


GCGGACGCGGCGTGGGCG 


11 


7 


ZiM 


GCGGACGCGXGCGTGGGCG 


6 


4 


Z2M 


GCGGACGCGX 2 GCGTGGGCG 


7 


6 


Z3M 


GCGGACGCGX3GCGTGGGCG 


5 


4 


Z4M 


GCGGACGCGX4GCGTGGGCG 


13 


3 . 


Z5M 


GCGGACGCGX5GCGTGGGCG 


16 


8 


Z6M 


GCGGACGCGXsGCGTGGGCG 


17 


7 


Z7M 


GCGGACGCGX7GCGTGGGCG 


5 


3 


Z8M 


GCGGACGCGX«GCGTGGGCG 


5 


6 


Z9M 


GCGGACGCGX 9 GCGTGGGCG 


5 


4 


Z10M 


GCGGACGCGX10GCGTGGGCG 


4 


3 
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Table 7, The binding site sequences used in gel-shift experiments with the ZIF- 
GAC fusion peptides and the binding affinities obtained. *Non-bound DNA bases in 
the target sequence are shown by a bold 'X\ The exact base composition of these gaps 
is found to have no significant effect on peptide affinity. 

5 Example 21: Binding Affinities of ZIF-GAC and 3x2F ZGS peptides to Targets 
with Deleted Subsequence 

This example shows the differential effects of looping out of a single finger . 
from a zinc finger protein/DNA complex. 

To investigate the effect of finger-flipping or looping in 2x3F and 3x2F zinc- 
1 0 finger peptides, gel-shift experiments are conducted with the 2x3F ZIF-GAC peptide 
and the 3x2F ZGS peptide, against a selection of modified binding sites; bsl, bs2, bs3, 
bs4 (Figures 22 and 23), as well as bsA and bsC, as control sites. Figure 22 shows 
results of gel-shift experiments in which the 2x3F ZIF-GAC peptide is tested for 
binding to the 9 base pair ZIF binding site (target bsA), the 18 base pair ZIF-GAC 
15 binding site (bsC) as well as bsl, bs2, bs3 and bs4 5 which comprise the ZIF-GAC bsC 
sequence, but with the three base subsequence recognised by finger 4 of 2x3 F ZIF- 
GAC removed, and 0, 1, 2 or 3 base pairs respectively inserted in its place, while 
Figure 23 shows corresponding experiments using 3x2F ZGS peptide. 

By comparing the relative affinities of each peptide for the sites bsl -4 against 
20 the designed, full-length binding site, bsC; the ability of zinc-finger peptides to 

accommodate finger "flipping" can be demonstrated. The sequence of bsl is similar to 
that of bsC, but with the three bases recognised by finger 4 of the 3x2F ZGS or 2x3F 
ZIF-GAC peptides completely removed. The sites bs2, bs3 and bs4 are identical to 
bs 1 , except for the insertion of 1 , 2 or 3 base pairs (respectively), in the region 
25 normally bound by zinc-finger 4 of the fusion peptides, the inserted residues are 

selected so that they would not be the same as the sequence recognised by finger 4. It 
should be noted that the binding site of bs4 is the same length as bsC, but zinc-finger 4 
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will not contribute binding energy to the complex with this site. The other sites, bsl, 
bs2 and bs3 are shorter by 3, 2 and 1 bps respectively. 

The gel-shift results with the 2x3F ZIF-GAC and 3x2F ZGS peptides are 
shown in Figures 22 and 23 respectively. Serial 5-fold dilutions of peptide are made 
and incubated with 0.01 nM binding site. Significantly, the results demonstrate that the 
3x2F ZGS peptide is far more selective for the correct, full-length binding site (bsC) 
than is the 2x3F ZIF-GAC peptide. The gel-shift results of Figure 23 show that the 
3x2F ZGS peptide binds the incorrect, full-length binding site (bs4) approximately 
125-fold weaker than it does bsC; its binding is therefore relatively specific. It also 
binds the site bs3 and bs2 with almost identical affinity to bs4. (These sites are 
truncated in the region normally bound by finger 4). The shortest site, bsl, is bound at 
least 625-fold less tightly than the correct binding sequence, bsC. The 3x2F ZGS 
peptide clearly binds bsl slightly more tightly than it does the ZIF site alone, but the 
concentrations of protein and binding site used in these experiments are such that 
binding to the ZIF site alone is barely detectable. In contrast, the 2x3F ZIF-GAC 
peptide binds the sequence of bs4 only 5-fold more weakly than it does bsC, and as 
above, its affinity for the sites bs3 and bs2 are very similar to that of bs4, 
demonstrating that it is relatively non-specific. The peptide shows reasonable 
discrimination when targeted to the bsl site, which it binds approximately 125-fold 
weaker than bsC. These data clearly demonstrate than the individual zinc-fingers 
within a zinc-finger array (such as the 2x3 F ZIF-GAC and the 3x2F ZGS peptides) are 
able to "flip" out of the DNA major groove - when they do not recognise the DNA 
sequence presented to them - in order to allow the remaining zinc-fingers to bind in the 
most optimal conformation. The ability of the zinc-finger peptide to accommodate this 
conformational change is dependant on the construction of the peptide. These results 
show that the detrimental effects of finger "flipping" are far more pronounced in the 
3x2F ZGS peptide than in the 2x3F ZIF-GAC peptide, demonstrating that 3x2F 
peptides are far more specific than 2x3F peptides. 
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Example 22. Use of Two Finger Library for Selection of Zinc Fingers 

The simplest approach is to construct an isolated two finger library, comprising 
amino acids known to contribute to DNA-binding affinity and specificity. Such a 
library is constructed using suitable randomizations. A phage display library is 
constructed using methods known in the art, and a number of 6-7 bp DNA targets are 
used in selections that are carried out essentially as detailed in patent applications WO 
96/06166 and WO 98/53057. After the selection process is complete, a number of 
tightly binding zinc finger proteins are isolated. 

Example 23. Use of Combinatorial Library for Selection of Zinc Fingers 

We further demonstrate the construction of libraries for 2-finger domains 
whose register of interaction is precisely fixed. This is achieved by employing U GCG" 
anchors and two extensively-randomised zinc fingers. The libraries are designed to 
take into account synergistic effects between zinc fingers, by modifying cross-strand 
contacts from position 2. Consequently, position 2 of F2 is modified to Ser or Ala so 
as to interact universally with either the r C in the "GCG" anchor, or any base ( 7 *N) in 
the final target site sequence. Similarly, position 2 of F3 is modified to Ser or Ala so as 
not to interfere with the selection of bases 4 'X or 4 X, Phage display libraries are 
constructed using methods known in the art, and a number of DNA targets are used in 
selections that are carried out essentially as detailed in patent applications WO 
96/06166 and WO 98/53057. After the selection process is complete, a number of 
tightly binding zinc finger proteins are isolated. After selecting against particular DNA 
target sites, the genes for the appropriate 2-finger domains are easily recovered by 
PCR. 

Example 24. Use of Combinatorial Library for Selection of Zine Fingers 

Phage Diplay libraries Lib 1/2 and Lib 2/3 are used to select 2-Finger 
construction units. More specifically, the libraries are used to select two finger units 
that bind DNA sites of the form 5'-GXX XXX-3' or 5'-XXX XXG-3' (where X is any 
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base). Despite the fact that one base must be fixed as U G ? ' in each target site, this still 
allows 2048 of all the 4096 (=4 6 ) possible 6-base 2-finger recognition sites to be 
targeted. Phage display libraries are constructed using methods known in the art, and a 
number of DNA targets are used in selections that are carried out essentially as 
5 detailed in patent applications WO 96/06 1 66 and WO 98/53057. After the selection 
process is complete, a number of tightly binding zinc finger proteins are isolated. 

The genes for the appropriate 2-finger domains are easily recovered by PGR. 
Because of the design of the libraries, the "GCGG" or "GGCG" anchors serve to fix 
the register of DNA-protein interaction very precisely. Hence, the required 2-finger 

10 domains may be specifically amplified from the respective libraries constructs by 
selective PCR using primers which bind only to the DNA sequence of finger 1 or 
finger 2 or finger 3. The first finger of the eventual 3x2F construct is preceded by an 
Xba 1 site and a MET codon. The second finger is joined to the third finger using an 
engineered Eag I site. The fourth finger is joined to the fifth finger through a Bamtil 

1 5 site (at the end of finger 4) and a Bgl II site (at the start of finger 5). The sixth finger is 
followed by an EcoRl site. 

The sequences are designed such that: If finger 2 jobs to itself via the Eag I 
site, a Not I site is generated so this incorrect product can be recycled by digestion. 
When finger 4 joins correctly to finger 5 both BamYR and Bgl II sites are destroyed, 
20 however incorrectly fused units can be redigested with the appropriate enzyme. Hence, 
only the full-length 3x2F construct will be amplified with terminal primers following 
ligation of the three 2-finger units. 

Using these construction techniques, the three 2-finger units selected as 
described above are fused to form a 3x2 protein, 

25 

Example 24. Library Selection of 2-Finger Units for Construction of 3x2f 
Peptides 

As described above, 3x2F peptides may be made by linking 2 finger modules 
with suitable linkers. The above examples describe the isolation of such 2 finger 
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modules by ligation of synthetic oligonucleotides. However, and as described here, 2 
finger modules may be selected by phage display using libraries (LIBJ2 and LIB23 
libraries) comprising approximately one and a half fingers (see above and WO 
98/53057). 

5 Thus, the required 2-finger domains may be specifically amplified from the 

library constructs by selective PCR - using primers which bind only to the DNA 
sequence of finger 1 or finger 2 or finger 3. The sequences of these primers are as 
shown in the Examples above. 

The first finger of the eventual 3x2F construct is preceded by an Xba I site and 
1 0 a MET codon. The second finger is joined to the third finger using an engineered Eag I 
site. The fourth finger is joined to the fifth finger through a BamRl site (at the end of 
finger 4) and a Bgl II site (at the start of finger 5).The sixth finger is followed by an 
EcoRl site. 

The sequences are designed such that: If finger 2 joins to itself via the Eag I 
1 5 site, a Not I site is generated so this incorrect product can be recycled by digestion. 
When finger 4 joins correctly to finger 5 both BamUl and Bgl II sites are destroyed, 
however incorrectly fused units can be redigested with the appropriate enzyme. Hence, 
only the full-length 3x2F construct will be amplified with terminal primers following 
ligation of the three 2-finger units. 

20 

Example 25. Primer Sequences 

Primers are named by the following method: A, B, C (in position 1) shows 
which of the three 2-finger units is to be amplified, A is the first two fingers of the 
3x2F construct, B implies fingers 3 and 4 and C fingers 5 and 6. N, C (in position 2) 
25 shows whether the oligo primes from the N- or C-terminus. Fl, F2, F3 shows which 
finger of the 3-finger library the primer binds to. L12, L23, L123 shows whether the 
primer binds specifically to LIB12, Lib23 or binds to both libraries. 
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The final two primers are specific for the extreme N- and C- termini of the 
3x2F constructs and are used to amplify the full-length ligation product from any 
intermediate species. 

ANF1L12 
5 Xba I 

CAG TTG CGT CTA GAC GCC GCC ATG GCG GAG AGG CCC TAC GCA 
TGC 

ANF2L123 
10 Xba I 

CAG TTG CGT CTA GAC GCC GCC ATG GCT GAG AGG CCC TTC CAG 
TGT CGA ATC TGC AT 

ANF1L23 
15 Xba I 

CAG TTG CGT CTA GAC GCC GCC ATG GCA GAA CGC CCA TAT GCT 
TGC 

ACF3L12 
20 Eagl 

GC GGC CGC CGG CCG CTG GCC TCC TGT ATG GAT TTT GGT A 

ACF2L123 

Eagl 

25 CAT GGC ATT CGG CCG CTC GCC TCC TGT GTG GGT GCG GAT G 
ACF3L23 

Eagl 

GC GGC CGC CGG CCG TTG TCC GCC CGT GTG TAT CTT GGT A 

30 

BNF1L12 

Eagl 

TCA AGC TGC CGG CCG TAC GCA TGC CCT GTC GAG TC 
35 BNF2L123 

Eagl 

AGC TCT CAG CGG CCG TTC CAG TGT CGA ATC TGC AT 

BNF1L23 
40 Eag I 

TCA AGC TGA CGG CCG TAT GCT TGC CCT GTC GAG TC 

BCF3L12 

BamRl 

45 CGC GTC CTT CTG GGA TCC TGT ATG GAT TTT GGT A 
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BCF2L123 

BarnHl 

ACC CTT CTC GGA TCC TGT GTG GGT GCG GAT G 
5 BCF3L23 

BamR I 

C CGC ATC TTT TTG GGA TCC CGT GTG TAT CTT GGT A 

CNF1L12 
10 Bglll 

TCA AGC TGC AGA TCT GAG AGG CCC TAC GCA TGC CCT GTC 

CNF2L123. 

Bglll 

15 ACG TCT ACG AGA TCT CAG AAG CCC TTC CAG TGT CGA ATC TGC AT 
CNF1L23 

BglU 

TCA AGC TGA AGA TCT GAA CGC CCA TAT GCT TGC CCT GTC 

20 

CCF3L12 

EcoRl 

CAT TTA GGA ATT CCG GGC CGC GTC CTT CTG TCT CAG ATG GAT 
TTT 

25 

CCF2L123 

EcoRl 

CAT TTA GGA ATT CCG GGC CGC ATC CTT CTG GCG CAG GTG GGT 
GCG GAT G 

30 

CCF3L23 

EcoRI 

CAT TTA GGA ATT CCG GGC CGC ATC TTT TTG GCG CAG GTG TAT C 
35 NXbaAMP 

Xba I 

CAG TTG CGT CTA GAC GCC GCC 

CEcoAMP 

40 EcoRI 

CAT TTA GGA ATT CCG GGC CGC 



Example 26. Selection of Sites and Construction of 3x2f Znf to Bind the GC Box / 
NRF-1 Site in Promoter Region of the CXCR4 Gene 



45 



Promoter Sequence (top) with potential 6 bp sites marked below. 
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5 



5 7 
A 
B 
C 



TCCCCGCCCCAGCGGCGCATGCGCCGCGC 3' 
TCCCCGCCCCAG GGCGC A GCGCCG 
GCCCCAGCGGCGCATGCG 
CAGCGGCGCATG 



N.B. 6 bp sites are chosen which are either adjacent or within 1 bp of each 
other as 2-finger units bind optimally when within 1 bp of each other. 



Perform selections in usual manner. GCCCCA: target with LIB 12 and take 
fingers 1 and 2 - F5+F6 of the 3x2 construct. GCGGCG: may be targeted by LIB 1 2 
and take fingers 1 and 2, or fingers 2 and 3; or may be targeted by LIB23 and take 
fingers 2 and 3 or fingers 1 and 2. Generates F3+F4 of the 3x2 construct. CATGCG: 
15 . can be targeted by LIB23 and take fingers 2 and 3. Gives F1+F2 of the 3x2 construct. 

ii) Join 2-finger units to create 3x2F peptide. 

PCR amplify fingers binding appropriate sequences. Purify 2-finger products. 
Combine products, digest with Eag I, Bamtt I and Bgl II. Heat inactivate Eag L Ligate 
20 fragments together in the presence of Not I, BamR I and Bgl II to destroy incorrectly 
ligated fragments. PCR amplify 6-finger construct with N - and C-terminal specific 
primers. Digest with Xba I and EcoR I, ligate into similarly digested vector - pTracer. 



PROTOCOL 



10 



0 Select sites on row B. 
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Example 27. Comparison of a 2x3F Peptide and a Similar 3x2F Peptide 

A. Creation of a 2x3 F Peptide 

3-finger units are selected to bind the 9-bp target sequences, 11 and 9 (below), 
essentially as described above and also in WO 98/53057. 



The 3 finger peptide which binds site 11 is referred to as pepll, and the 3 
finger peptide which binds site 9 is referred to as pep9. To create a 2x3F peptide 
pepll is joined to the N-terminus of pep9, using the procedure below, and the new 6- 
finger construct is called 2x3F pepll-9. This new peptide targets the contiguous 
sequence 11-9, shown above. 

All primer sequences in this Example are the same as the corresponding 
sequences in Example 25 having the same name. Primer CWT2 is identical to Primer a 
(SEQ ID NO: 2); Primer NWT3S is identical to Primer B (SEQ ED NO: 3); Primer 
CGAC1 is identical to Primer c (SEQ ID NO: 6); Primer NGAC2F is identical to- 
Primer D (SEQ ID NO: 7). Primer 3x2CF3L23 has the following sequence: GC GGC 
CGC CGG CCG CTG GCC CGT GTG TAT CTT GGT A. 

The sequence of 2x3Fpepl 1-9 is shown in Figure 27, and the sequence of 
3x2Fpepl 1 -9 is shown in Figure 28. 

Construction Procedure 

Primer pairs: ANF1L12 and BCF3L23; and CNF1L23 and CCF3L23, are 
used to amplify the DNA encoding pepll and pep9 respectively. This created a BamH 
I site at the 3' end of the pepll gene and a Bgl II site at the 5' end of the pep9 gene. 
Hence, digestion of the PCR fragments with these enzymes, followed by ligation 



11 
9 

11-9 



GCA GGG GTT 
GGC CAG GCG 

GGC CAG GCG GCA GGG GTT 
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created the 6-finger construct 2x3F pepll-9, in which both original enzyme sites are 
destroyed and the peptide linker sequence -TGSERP- is created. The fall-length 
fragment is then digested with Xba I and EcoR I and ligated into similarly digested 
pTracer (Invitrogen). 

5 B. Creation of the 3x2F Peptide 

To give a direct comparison between a selected 2x3F peptide and a 3x2F 
peptide targeted against the same DNA sequences, the zinc fingers of pepll and pep9 
are fused together in the style of a 3x2F peptide, using the procedure outlined below. 
This peptide, called 3x2F pepll-9, targets the contiguous DNA sequence 11-9, above. 
10 Again, primer and peptide sequences are as shown above and in the Figures. 

Construction Procedure 

Fingers 1 and 2 of pepll are amplified by PCR using primers ANF1L12 and 
CWT2. Separately, finger 3 of pepll is amplified using primers NWT3S and 
3x2CF3L23. The 3-finger fragment pepl 1(3x2) is then created by overlap PCR using 

15 the above fragments. Similarly, finger 1 of pep9 is amplified using primers BNF1L23 
and CGAC1, and fingers 2 and 3 of pep9 are amplified using primers NGAC2S and 
CCF3L23. The 3-finger fragment pep9(3x2) is then created by overlap PCR. The 
primers 3x2CF3L23 and BNF1L23 produce Eag I restriction sites at the 3' and 5' 
ends of pepll(3x2) and pep9(3x2) respectively. Hence, digestion of the two 3-finger 

20 fragments with Eag I, followed by ligation created the 6-finger construct 3x2F pepll- 
9. In this peptide the linker sequences -TGGEKP- and -TGGQKP- are inserted 
between fingers 2 and 3 and fingers 4 and 5 respectively, and the sequence -TGQRP- 
separates fingers 3 and 4. The full-length fragment is then digested with Xba I and 
EcoR I and ligated into similarly digested pTracer (Invitrogen), as above. 
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C Methods 

The 2x3F pepll-9 and 3x2F pepll-9 peptides are compared by assessing their 
binding affinities for the 11-9 binding site and for binding site sequences mutated in 
the region bound by finger 1 (ll-9mutl), finger 3 (ll-9mut3), or with the bases 
5 bound by finger 3 deleted (ll-9del3). These sequences are shown below, with mutated 
regions underlined. 

11-9: GGC CAG GCG GCA GGG GTT 

ll-9mutl: GGC CAG GCG GCA GGG ACC 
10 ll-9mut3: GGC CAG GCG ATG GGG GTT 
ll-9del3: GGC CAG GCG GGG GTT 

In vitro fluorescence ELISA is used to estimate the binding specificity of.each 
peptide for the various target sites, as described below. 

Protocol for In Vitro Fluorescence ELISA 

15 Preparation of Template 

Zinc finger constructs are inserted into the protein expression vector pTracer 
(Invitrogen), downstream of the T7 RNA transcription promoter. Suitable templates 
for in vitro ELISA are created by PCR using the 5' primer 

(GCAGAGCTCTCTGGCTAACTAGAG), which binds upstream of the T7 promoter 
20 and a 3' primer, which binds to the 3* end of the zinc finger construct and adds a 
sequence encoding for the HA-antibody epitope tag (YPYDVPDYA). 

Zinc Finger Expression 

In vitro transcription and translation are performed using the T7 TNT Quick 
Coupled Transcription / Translation System for PCR templates (Promega), according 
25 to the manufacturers instructions, except that the medium is supplemented with 500 
liM ZnCl 2 . 
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Fluorescence ELISA 

DN A binding reactions contained the appropriate zinc finger peptide, 
biotinylated binding site (10 nM) and 5 \ig competitor DNA (sonicated salmon sperm 
DNA), in a total volume of 50 pi, which contained: 1 x PBS (pH 7.0), 1.25 x 10° U 

5 high affinity anti-HA-Peroxidase antibody (Boehringer Mannheim), 50 uM ZnCl 2 , 
0.01 mg/ml BSA, and 0.5% Tween 20. Incubations are performed at room temperature 
for 40 minutes. Black streptavidin-coated wells are blocked with 4% marvel for 1 
hour. Binding reactions are added to the streptavidin-coated wells and incubated for a 
further 40 minutes at room temperature. Wells are washed 5 times in 100 \il wash 

10 buffer (1 x PBS (pH 70), 50 fiM ZnCl 2 , 0.01 mg/ml BSA, and 0.5% Tween 20), and 
finally 50 jjJ QuantaBlu peroxidase substrate solution (Pierce) is added to detect bound 
HA-tagged zinc finger peptide. ELISA signals are read in a SPECTRAmax GeminiXS 
spectrophotometer (Molecular Devices) and analysed using SOFTmax Pro 3.1 .2 
(Molecular Devices). 

15 a Results 

In Vitro Fluorescence ELISA Assay 

To compare the specificity of the 2x3F pepll-9 and 3x2F pepll-9 peptides, 
samples from the same translation reaction are assayed against each of the binding 
sites above. The ELISA signals obtained from each assay are then normalised relative 
20 to the maximum signal obtained for that peptide. (In this way the absolute amount of 
either peptide produced by the in vitro transcription / translation system is 
insignificant). These data are then plotted on a graph, shown as Figure 26. 



25 



As can be seen, the data demonstrates that the 3x2F peptide shows greater 
selectivity / specificity for its correct target sequence, over mutant sequences, than 
does the 2x3F peptide. 
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Each of the applications and patents mentioned above, and each document 
cited or referenced in each of the foregoing applications and patents, including during 
the prosecution of each of the foregoing applications and patents ("application cited 
documents") and any manufacturer's instructions or catalogues for any products cited 
5 or mentioned in each of the foregoing applications and patents and in any of the 
application cited documents, are hereby incorporated herein by reference. 
Furthermore, all documents cited in this text, and all documents cited or referenced in 
documents cited in this text, and any manufacturer's instructions or catalogues for any 
products cited or mentioned in this text, are hereby incorporated herein by reference. 

Each of the applications and patents mentioned above, and each document 
cited or referenced in each of the foregoing applications and patents, including during 
the prosecution of each of the foregoing applications and patents ("application cited 
documents") and any manufacturer's instructions or catalogues for any products cited 
or mentioned in each of the foregoing applications and patents and in any of the 
application cited documents, are hereby incorporated herein by reference. 
Furthermore, all documents cited in this text, and all documents cited or referenced in 
documents cited in this text, and any manufacturer's instructions or catalogues for any 
products cited or mentioned in this text, are hereby incorporated herein by reference. 
In particular, we hereby incorporate by reference International Patent Application 
Numbers PCT/GB00/02080, PCT/GB00/02071, PCT/GB00/03765, United Kingdom 
Patent Application Numbers GB0001582.6, GB0001578.4, and GB9912635.1 as well 
as US09/478513. 

Various modifications and variations of the described methods and system of 
the invention will be apparent to those skilled in the art without departing from the 
scope and spirit of the invention. Although the invention has been described in 
connection with specific preferred embodiments, it should be understood that the 
invention as claimed should not be unduly limited to such specific embodiments. 
Indeed, various modifications of the described modes for carrying out the invention 
which are obvious to those skilled in molecular biology or related fields are intended 
to be within the scope of the following claims. 
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Claims 

1 . A method of producing a modified nucleic acid binding polypeptide, the 
method comprising the steps of: 

(a) providing a nucleic acid binding polypeptide comprising a plurality of 
5 nucleic acid binding modules; 

(b) selecting a first binding domain consisting of one or two contiguous nucleic 
acid binding modules; 

(c) selecting a second binding domain consisting of one or two contiguous 
nucleic acid binding modules; and 

1 0 (d) introducing a linker sequence to link the first and second binding domains, 

the linker sequence comprising five or more amino acid residues. 

2. A method of producing a modified nucleic acid binding polypeptide, the 
method comprising the steps of: 

(a) providing a nucleic acid binding polypeptide comprising a plurality of 
1 5 nucleic acid binding modules; 

(b) selecting a first binding domain comprising a nucleic acid binding module; 

(c) selecting a second binding domain comprising a nucleic acid binding 
module; and 

(d) introducing a linker sequence comprising a structured linker to link the first 
20 and second binding domains. 

3. A method of producing a modified nucleic acid nucleic acid binding 
polypeptide, the method comprising the steps of: 
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(a) providing a nucleic acid binding polypeptide comprising a plurality of 
nucleic acid binding modules; 

(b) selecting a first binding domain consisting of one or two contiguous nucleic 
acid binding modules; 

5 (c) selecting a second binding domain consisting of one or two contiguous 

nucleic acid binding modules; 

(d) introducing a first linker sequence to link the first and second binding 
domains, the linker sequence comprising five or more amino acid residues; 

(e) selecting a third binding domain comprising a nucleic acid binding module; 

10 (f) selecting a fourth binding domain comprising a nucleic acid binding 

module; and 

(g) introducing a second linker sequence comprising a structured linker to link 
the third and fourth binding domains. 

4. A method according to any of Claim 1 or Claim 2 5 in which steps (b) to (d) are 
15 repeated, 

5. A method according to Claim 3, in which steps (b) to (d) and/or steps (e) to (g) 
are repeated. 

6. A method according to any preceding claim, in which the binding affinity 
and/or specificity of the modified polypeptide to a nucleic acid sequence is increased 

20 compared to the binding affinity and/or specificity of an unmodified polypeptide. 

7. A method according to any preceding claim, in which the nucleic acid 
sequence comprises a sequence which is bound by the unmodified polypeptide. 
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8. A method according to any preceding claim, in which the nucleic acid 
sequence comprises a sequence bound by the unmodified nucleic acid binding 
polypeptide, into which one or more nucleic acid residues has been inserted. 

9. A method according to any preceding claim, in which the nucleic acid 

5 residue(s) are inserted between target subsites bound by the first and second binding 
domains of the unmodified polypeptide. 

10. A method according to Claim 8 or 9, in which the number of inserted nucleic 
acid residues is 5, 6, 7, 8, 9, 10 or 1 1 . 

11. A method of making a nucleic acid binding polypeptide, the method 
10 comprising the steps of: 

(a) providing a first binding domain and a second binding domain, at least one 
of the first and second binding domains consisting of one or two nucleic acid 
binding module(s); and 

(b) linking the first and second binding domains with a linker sequence 
1 5 comprising five or more amino acid residues. 

12. A method of making a nucleic acid binding polypeptide, the method 
comprising the steps of: 

(a) providing a first binding domain comprising a nucleic acid binding module; 

(b) providing a second binding domain comprising a nucleic acid binding 
20 module; and 

(c) linking the first and second binding domains with a linker sequence 
comprising a structured linker. 
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13. A method of making a nucleic acid binding polypeptide, the method 
comprising the steps of: 

(a) providing a first binding domain and a second binding domain, at least one 
of the first and second binding domains consisting of one or two nucleic acid 

5 binding module(s); 

(b) linking the first and second binding domains with a first linker sequence 
comprising five or more amino acid residues; 

(c) providing a third binding domain comprising a nucleic acid binding 
module; 

10 (d) providing a fourth binding domain comprising a nucleic acid binding 

module; and 

(e) linking the third and fourth binding domains with a second linker sequence 
comprising a structured linker. 

14. A method according to Claim 1 , 3 or 13, in which the first linker sequence 
1 5 comprises a flexible linker. 

15. A nucleic acid binding polypeptide comprising a first binding domain and a 
second binding domain linked by a linker sequence comprising five or more amino 
acid residues, in which at least one of the first and second binding domains consists of 
one or two nucleic acid binding module(s). 

20 16. A non-naturally occurring nucleic acid binding polypeptide comprising a first 
binding domain comprising a nucleic, acid binding module and a second binding 
domain comprising a nucleic acid binding module, the first and second binding 
domains being linked by a linker sequence comprising a structured linker. 
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1 7. A nucleic acid binding polypeptide comprising a first binding domain 
comprising a nucleic acid binding module and a second binding domain comprising a 
nucleic acid binding module, the first and second binding domains being linked by a 
linker sequence comprising a structured linker; a third binding domain consisting of 

5 one or two contiguous nucleic acid binding modules and a fourth binding domain 
consisting of one or two contiguous nucleic acid binding modules, the third and fourth 
binding domains being linked by a second linker sequence comprising five or more 
amino acid residues. 

18. A method or polypeptide according to any preceding claim, in which the 
1 0 nucleic acid binding module is a zinc finger of the Cys2-His 2 type. 

19. A method or polypeptide according to any preceding claim, in which the 
nucleic acid binding module is selected from the group consisting of naturally 
occurring zinc fingers and consensus zinc fingers. 

20. A method or polypeptide according to any of Claims 1, 3, 1 1, 13, 15 or 17, in 
1 5 which each of the first and the second binding domains consists of two binding 

modules. 

21 . A method or polypeptide according to any preceding claim, in which the first 
linker sequence comprises between 5 and 8 amino acid residues. 

22. A method or polypeptide according to any preceding claim, in which the first 
20 linker sequence is provided by insertion of one or more amino acid residues into a 

canonical linker sequence. 

23. A method or polypeptide according to Claim 22, in which the canonical linker 
sequence is selected from GEKP, GERP, GQKP and GQRP. 
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24. A method or polypeptide according to any preceding claim, in which the first 
linker sequence comprises a sequence selected from: GGEKP, GGQKP, GGSGEKP, 
GGSGQKP, GGSGGSGEKP, and GGSGGSGQKP. 

25. A method or polypeptide according to any preceding claim, in which the 

5 nucleic acid binding polypeptide comprises a nucleic acid sequence selected from SEQ 
ID Nos: 22, 23, 24, 25, 26 and 27. 

26. A method or polypeptide according to any preceding claim, in which the 
structured linker comprises an amino acid sequence which is not capable of 
specifically binding nucleic acid. 

10 27. A method or polypeptide according to any preceding claim, in which the 

structured linker is derived from a zinc finger by mutation of one or more of its base 
contacting residues to reduce or abolish nucleic acid binding activity of the zinc finger. 

28. A method or polypeptide according to any preceding claim, in which the 
structured linker comprises the amino acid sequence of TFIIIA finger TV. 

1 5 29. A method or polypeptide according to any preceding claim, in which the zinc 
finger is finger 2 of wild type Zif268 mutated at positions -1, 2, 3 and 6. 

30. A method or polypeptide according to'any of Claims 2, 12, or 1 6, in which the 
first or second nucleic acid binding domain is selected from the group consisting of: 
fingers 1 to 3 of TFIIIA, GAC and Zif, or a method or polypeptide according to any of 

20 Claims 3, 13 or 17, in which the third or fourth nucleic acid binding domain is selected 
from said group. 

31. A method or polypeptide according to any preceding claim, in which the 
nucleic acid binding polypeptide comprises substantially the sequence of Zif-ZnF- 
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GAC (SEQ ID NO: 55), GAC-F4-Zif (SEQ ID NO: 54) or TF(l-4)-ZIF (SEQ ID NO: 
53). 

32. A method or polypeptide according to any preceding claim, in which the or 
each linker sequence comprises one or more further sequence(s), each further sequence 

5 comprising a canonical linker sequence, preferably GEKP, GERP, GQKP or GQRP, 
optionally comprising one or more amino acid sequences inserted into the canonical 
sequence. 

33. A method or polypeptide according to Claim 32, in which said further 
sequences are selected from: GGEKP, GGQKP, GGSGEKP, GGSGQKP, 

10 GGSGGSGEKP, and GGSGGSGQKP. 

34. A nucleic acid binding polypeptide produced by a method according to any of 
Claims 1 to 14 and 18 to 33. 

35. A nucleic acid encoding a nucleic acid binding polypeptide according to any of 
Claims 15 to 33. 

15 36. A host cell transformed with a nucleic acid according to Claim 35. 

37. A pharmaceutical composition comprising a polypeptide according to any of 
Claims 15 to 33 or a nucleic acid according to Claim 35 together with a 
pharmaceutically acceptable carrier. 

38. Use of a structured linker in a method of making a nucleic acid binding 
20 polypeptide. 

39. Use according to Claim 38, in which the structured linker separates first and 
second nucleic acid binding domains of the nucleic acid bindng polypeptide, to enable 
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the polypeptide to bind a nucleic acid target in which subsites bound by respective 
domains of the polypeptide are separated by one or more nucleic acid residues. 

40. A nucleic acid binding polypeptide comprising a repressor domain and a 
plurality of nucleic acid binding domains, the nucleic acid binding domains being 

5 linked by at least one non-canonical linker. 

41 . A nucleic acid binding polypeptide according to Claim 40, in which the 
repressor domain is a transcriptional repressor domain selected from the group 
consisting of: a KRAB-A domain, an engrailed domain and a snag domain. 

42. A nucleic acid binding polypeptide according to Claim 40 or 41 , in which the 
1 0 nucleic acid binding domains are linked by at least one flexible linker. 

43. A nucleic acid binding polypeptide according to Claim 40, 41 or 42, in which 
the nucleic acid binding domains are linked by at least one structured linker. 

44. Use of a nucleic acid binding domain comprising two zinc finger modules as a 
basic unit in the construction of a nucleic acid binding polypeptide. 

15 45. A method of producing a nucleic acid binding polypeptide, the method 
comprising providing a first and a second nucleic acid binding domain each 
comprising two zinc finger modules, and linking the first and second nucleic acid 
binding domains with a a structured linker sequence or a flexible linker sequence. 

46. Use of a amino acid sequence comprising five or more amino acid residues as a 
20 flexible linker to join two or more nucleic acid binding domains comprising two zinc 
finger modules. 
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47. Use of an amino acid sequence comprising a zinc finger which is not capable 
of specifically binding nucleic acid, as a structured linker to join two or more nucleic 
acid binding domains comprising two zinc finger modules. 

48. Use according to Claim 44, 46 or 47 or a method according to Claim 45, in 
5 which the nucleic acid binding domain is selected from a zinc finger polypeptide 

library, in which each polypeptide in the library comprises more than one zinc finger 
and wherein each polypeptide has been at least partially randomised such that the 
randomisation extends to cover the overlap of a single pair of zinc fingers. 

49. A method for producing nucleic acid binding domains comprising two zinc 

1 0 finger modules for use in constructing a nucleic acid binding polypeptide, the method 
comprising the steps of: 

(a) providing a zinc finger polypeptide library, in which each polypeptide in the 
library comprises more than one zinc finger and wherein each polypeptide has 
been at least partially randomised such that the randomisation extends to cover 

1 5 the overlap of a single pair of zinc fingers; 

(b) providing a nucleic acid sequence comprising at least 6 nucleotides; and 

(c) selecting sequences in the zinc finger library which are capable of binding 
to the nucleic acid sequence. 

50. A use or method according to Claim 48, or a method according to Claim 49, in 
20 which substantially one and a half zinc fingers are randomised in each polypeptide. 

51. A nucleic acid binding polypeptide comprising units of zinc finger binding 
domains linked by flexible and/or structured linkers, each zinc finger binding domain 
comprising two zinc finger modules. 
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ZIF-GAC 

SEQ ID NO: 21 

1/1 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
DELTRHIRIHTGQKPFQCRI 
121/41 151/51 

TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC 
CMRNFSRSDHLTTHIRTHTG 
181/61 211/71 

GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG 
BKPPACDICGRKFARSDERK 
241/81 271/91 

AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG GAC GGC GAA CGG CCG TAT GCT TGC CCT GTC 
RHTKIHLRQKDGERPYACPV 
301/101 331/111 

GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC 
ESCDRRFSRSDELTRHIRIH 
361/121 391/131 

ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT GAT AGA AGC AAT 
TQQKPFQCRICMRNPSDRSN 
421/141 451/151 

CTT GAA CGT CAC ACG AGG ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG 
LERHTRTHTGEKPPACDICG 
481/161 511/171 

AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG 

RKFARSDERKRHTKIHLRQK 

541/181 

GAC 



FIG. 2 
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3*2F ZGS 

SEQIDNO:22 

1/1 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
DELTRHIRIHT GQKPFQCRI 
121/41 151/51 

TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGT 
CMRNFSRSDHLTTHIRTHTG 
181/61 211/71 

GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC 
GEKPPACDICGRKFARSDER 
241/81 271/91 

AAG AGG CAT ACC AAA ATC CAT ACC GGT GAA CGG CCG TAT GCT TGC CCT GTC GAG TCC TGC 
KRHTKIHTGERPYACPVESC 
301/101 331/111 

GAT CGC CAC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGT GGC 
DRHFSRS DELTRHIRIHTGG 
361/121 391/131 

CAG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT GAT AGA AGC AAT CTT GAA 
QKPFQCRI CMiRNFSDRSNLE 
421/141 451/151 

CGT CAC ACG AGG ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG 
RHTRTHTGEKP FACDI CGRK 
481/161 511/171 

TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG GAC 
FARSDERKRHT KIHLRQKD 



FIG. 3 
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3x2F ZGL 
SEQIDNO:23 

1/1 31/11 

ATG GCA QAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
DELTRHIRIHTGQKP F Q C R I 
121/41 151/51 

TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC- ACC CAC ACA GfiC 
CMRNFSRSDHLTTHI RTHTG 
181/61 211/71 

GGT TCT GGC GAG AAG CCT TTT GCC. TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT 
GSGEKPFACDICGRKFARSD 
241/81 271/91 

GAA CGC AAG AGG CAT ACC AAA ATC CAT ACC GGT GAA CGG CCG TAT GCT TGC CCT GTC GAG 
ERKRHTKIHTGERPYACPVE 
301/101 331/111 

TCC TGC GAT CGC CAC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA 
SCDRHFSRSDBLTRHIRIHT 
361/121 391/131 

GGC GGT TCT GGC C AG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT GAT AGA 
GGSGQKPFQCRICMRNFSDR 
421/141 451/151 

AGC AAT CTT GAA CGT CAC ACG AGG ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT 
SNLERHTRTHTGEKPFACDI 
481/161 511/171 

TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA 

CGRKFARSDERKRHTKIHLR 

541/181 

CAG AAG GAC 

Q K D 



FIG. 4 
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3x2F ZGXL 
SEQIDNO: 24 

1/1 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
DELTRHIRIHTGQKPFQCRI 
121/41 151/51 

TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC 
CMRNFSRSDHLTTHI'RTHTG 
181/61 211/71 

GGT TCT GGC GGT T CT GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC 
GSGGSGEKPPACDI CGRKFA 
241/81 271/91 

AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT ACC GGT GAA CGG CCG TAT GCT TGC 
RSDERKRH TKIHTGERPYAC 
301/101 331/111 

CCT GTC GAG TCC TGC GAT CGC CAC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC 
PVESCDRHFSRSDBLTRHIR 
361/121 391/131 

ATC CAC ACA GGC GGT TCT GGC GOT TCT GGC CAG AAQ CCC TTC CAG TGT CGA ATC TGC ATG 
IHTGGSGGSGQKPPQCRICM 
421/141 451/151 

CGT AAC TTC AGT GAT AGA AGC AAT CTT GAA CGT CAC ACG AGG ACC CAC ACA GGC GAG AAG 
RNFSDRSNLERHTRTHTGEK 
481/161 511/171 

CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT 

PFACDICGRKFARSDERKRH 

541/181 571/191 

ACC AAA ATC CAT TTA AGA CAG AAG GAC 

TKIHLRQKD 



FIG. 5 
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3x2F ZGSL 
SEQIDNO:25 

1/1 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
DELTRHIRIHTGQKPFQCRI 
121/41 151/51 

TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGT 
CMRNPS RSDHLTTHIRTHTG 
181/61 211/71 

GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC 
GEKPFACDI CGRKFARSDER 
241/81 271/91 

AAG AGG CAT ACC AAA ATC CAT ACC GGT GAA CGG CCG TAT GCT TGC CCT GTC GAG TCC TGC 
KRHTKI HTGERPYACPVESC 
301/101 331/111 

GAT CGC CAC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC GOT 
DRHPSRSDELTRH IRI H T , O G 
361/121 391/131 

TCT GGC CAG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT GAT AGA AGC AAT 
SGQKPFQCRICMRNF$DRSN 
421/141 451/151 

CTT GAA CGT CAC ACG AGG ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG 
LERHTRTHTGEKPFACDICG 
481/161 511/171 

AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG 

RKFARSDERKRHTKIHLRQK 

541/181 

GAC 



FIG. 6 
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3x2F2GLS 
SEQIDNO: 26 

1/1 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
D E L T R H IRIHTGQKPFQCRI 
121/41 151/51 

TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC 
CMRNFSRSDHLTTHIRTHTG 
181/61 211/71 

GGT TCT GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT 
OSGEKPFACDICGRKFARSD 
241/81 271/91 

GAA CGC AAG AGG CAT ACC AAA ATC CAT ACC GGT GAA CGG CCG TAT GCT TGC CCT GTC GAG 
ERKRHTKIHTGERPYACPVE 
301/101 331/111 

TCC TGC GAT CGC CAC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA 
SCDRHFSRSDE LTRHIRIHT 
361/121 391/131 

GGT GGC CAG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT GAT AGA AGC AAT 
GGQKPFQCRICMRNFSDRSN 
421/141 451/151 

CTT GAA CGT CAC ACG AGG ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG 
IjERHTRTHTGEKPFACD I CG 
481/161 511/171 

AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG 

RKFARSDERKRHTKIHLRQK 

541/181 

GAC 
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SEQIDNO:27 

1/1 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGT GGC CAG AAG CCC TTC CAG TGT CGA 
DELTRHIRIHTGGQKPFQCR 
121/41 151/51 

ATC TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA 
I CM RNP S RSDHLTTHI RTHT 
181/61 ' 211/71 

GGT GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA 

GGEKPFACDI CGRKFARSDE 

241/81 271/91 

CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG GAC 

RKRHTKXHLRQKD 
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Nde\ 



A/del 



wtTFlllA 



wtZIF 




PCR using primer 
pairs: A + a, B + b. 



Overlap PCR: template 
fill-In, amplification with 
end primers: A +b. 



A/of I 




...pCITE 



Digest fulWength product with 
Nde I + Not I, ligate into pCITE 



E5 E '"" — — 3 1 1 I 

TFMA(F1-4)-ZIF clone 
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GAC-clone 



TFIIIA(F1-4)-ZIF 



F2 F3 




Notl 



PCR using primer 
pairs: C + c, D + d. 




Overlap PCR: template 
fill-in, amplification with 
end primers: C + d. 



Not\ 



F2 F3 F4 F5 F6 



Digest full-length product with 
Nde I + Not I, ligate into pCITE 



F7 | 

Not\ 



GAC-F4-ZIF clone 



..pCITE.. 
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TFHlAfFl-4)-ZIF 

SEQIDNO: 53 

1/1 31/11 

ATG GGA GAG AAG GCG CTG CCG GTG GTG TAT AAG CGG TAC ATC TGC TCT TTC GCC GAC TGC 
MGEKALPVVYKRYI CSFADC 
61/21 91/31 

GGC OCT GOT TAT AAC AAG AAC TGG AAA CTG CAG GCG CAT CTG TGC AAA CAC ACA GGA GAG 
GAAYNJCNWKLQAHLCKHTGE 
121/41 151/51 

AAA CCA TTT CCA TGT AAG GAA GAA GGA TGT GAG AAA GGC TTT ACC TCG CTT CAT CAC TTA 
KPFP CKEEGCEKGFTSLHHL 
181/61 211/71 

ACC CGC CAC TCA CTC ACT CAT ACT GGC GAG AAA AAC TTC ACA TGT GAC TCG GAT GGA TGT 
TRHSLTHTGEKNFTCDSDGC 
241/81 271/91 

GAC TTG AGA TTT ACT ACA AAG GCA AAC ATG AAG AAG CAC TTT AAC AGA TTC CAT AAC ATC 
DLRFTTKANMKKHFNRFHNI 
301/101 331/111 

AAG ATC TGC GTC TAT GTG TGC CAT TTT GAG AAC TGT GGC AAA GCA TTC AAG AAA CAC AAT 
KIC VYVCHFBNCGKAFKKHN 

361/121 391/131 

CAA TTA AAG GTT CAT CAG TTC AGT CAC ACA CAG CAG CTG CCG TAT GCT TGC CCT GTC GAG 
QLKVHQFSHTQQLPYACPVE 
421/141 451/151 

TCC TGC GAT CGC CGC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA 
SCDRRFSRSDELTRHIRIHT 
481/161 511/171 

GGC CAG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT 
GQKPFQ CRICMRNFSRSDHIi 
541/181 571/191 

ACC ACC CAC ATC CGC ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG 
TTHI RTHTGEKPFACDICGR 
601/201 631/211 

AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG GAC 
KFARSDERK RHTKIHLRQKD 



FIG. 15 



SUBSTITUTE SHEET (RULE 26) 



WO 01/53480 



PCT/GB01/00202 



16/27 



GAC-F4-ZIF 

SEQ ID NO: 54 

1/1 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
M A E R PYAC P V E S CDRRFS RS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
DELTRHIR IHTGQKPFQCRI 
121/41 151/51 

TGC ATG CGT AAC TTC AGT GAT AGA AGC AAT CTT GAA CGT CAC ACG AGG ACC CAC ACA GGC 
CMRN FSDRSNLERHTRTHTG 
181/61 211/71 

GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG 
EKPPA.C DI CGRKFARSDERK 
241/81 271/91 

AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG GAC AAC ATC AAG ATC TGC GTC TAT GTG TGC 
RHTKIHLRQKDN I KI CVYVC 
301/101 331/111 

CAT TTT GAG AAC TGT GGC AAA GCA TTC AAG AAA CAC AAT CAA TTA AAG GTT CAT CAG TTC 
H PEN CGKAFKKHNQLKVHQF 

361/121 391/131 

ACT CAC ACA CAQ CAG.CJG.CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT 
SHTQQLPYACPVESCDRRFS 
421/141 451/151 

CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT 
RSDELTRHIRIHTGQKPFQC 
481/161 511/171 

CGA ATC TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC 
RICMRNFSRSDHLTTHIRTH 
541/181 571/191 

ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA 
TGEKPFACD ICGR'KFARSDE 
601/201 631/211 

cgc aag agg cat acc aaa atc cat tta aga cag aag gac 
rkrht'kihlrqkd 
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ZEF-ZnF-GAC 
SEQ ID NO: 55 

1/1 . 31/11 

ATG GCA GAA CGC CCG TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG 
MAERPYACPVESCDRRFSRS 
61/21 91/31 

GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC 
DELTRHIR IHTGQKPFQCRI 
121/41 151/51 

TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC 
CMRNFSRSDHLTTH IRTHTG 
181/61 211/71 

GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG 'AGT GAT GAA CGC AAG 
EKPFACDI CGRKFARSDERK 
241/S1 271/91 

AGG CAT ACC AAA ATC CAT ACC GGT GAA CGG CCG TTC CAG T GT CGA ATC TGC ATG CGT A AC 
RHTKIHTGERPFQCRICMRN 
301/101 331/111 

nc ACT ygy a ct Age tct ctt acc agc cac atc cgc acc cac aca ggt gag cgg ccg tat 

FSSSSSLTSHIRTHTGERPY 
361/121 391/131 

GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT 
ACPVESCDRRFSRSDELTRH 
421/141 451/151 

ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT 
IRIHTGQKPFQCRICMRNFS 
481/161 511/171 

GAT AGA AGC AAT CTT GAA CGT CAC ACG AGG ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT 
DRSNLERHTRTHTGEKPFAC 

541/181 571/191 
GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT 

DICGRKFARSDERKRHTKIH 
601/201 

TTA AGA CAG AAG GAC 



FIG. 17 



SUBSTITUTE SHEET (RULE 26) 



WO 01/53480 



PCT/GB01/00202 




WO 01/53480 PCT/GB01/00202 



19/27 




GAC-F4-ZIF 



GAC-F4-ZIF 





bsA 
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(8 bp gap) 
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